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CHAPTER  1 


INTRODUCTION 

This  research  investigates  methods  for  estimating 
the  position  of  a moving  source  by  the  processing  of  an 
acoustically  radiated  signal  received  at  two  or  more 
physically  separated  sensors.  If  the  source  signal 
is  received  at  two  geographical  positions  in  the  presence 
of  uncorrelated  noise,  then,  depending  on  the  signal 
strength  and  duration,  it  is  possible  to  estimate  the 
bearing  to  the  source  relative  to  the  sensor  baseline. 

When  the  source  signal  is  received  at  three  sensors, 
range,  as  well  as  bearing  to  the  source,  can  be  estimated 
by  using  the  intersection  of  two  bearing  lines  of  position 
(LOPs).  The  mathematics  for  the  solution  to  the  problem 
of  finding  the  "best"  estimate  of  bearing  is  analogous 
to  the  more  general  problem  of  estimating  the  time  delay 
(or  group  delay)  between  two  time  series.  Therefore, 
this  dissertation  derives  the  maximum  likelihood  (ML) 
estimate  time  delay. 

Techniques  for  estimation  of  time  delay  can  be 
applied  to  a variety  of  practical  problems,  in  addition 
to  those  motivating  this  researcn.  For  example,  if  we 
consider  a signal  which  probes  a linear  time  invariant 
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system,  then  the  problem  of  estimating  time  delay  can 
be  viewed  as  attempting  to  identify  a parameter  of  the 
probed  system,  based  on  time-limited,  noise-corrupted 
observations  of  the  system  input  and  output.  The  delay 
is  a particularly  valuable  characterization  of  the  system 
(and  interrelationship  between  two  processes)  when  the 
system  output  is  an  attenuated  and  delayed  version  of  the 
input.  Physical  plants  in  which  delays  occur  can  also  be 
visualized  in  terms  of  the  bearing  estimation  problem. 

For  example,  consider  two  geographically  separated 
sensors  that  receive  a signal  from  an  acoustically 
radiating  point  source,  as  shown  in  Figure  1-1.  If 
the  properties  of  the  medium  are  such  that  the  signal 
from  the  source  propagates  at  a constant  velocity,  then 
the  travel  time  from  the  source  to  either  sensor  is 
directly  proportional  to  the  distance  traveled.  Thus, 
the  difference  between  the  travel  time  (from  the  source 
to  each  sensor)  or  time  delay  is  given  by  the  difference 
in  path  lengths  divided  by  the  propagation  velocity. 

There  exists  a well  defined  locus  of  points  (relative  to 
the  sensors)  for  which  the  time  delay  is  constant.  Hence, 
knowledge  of  the  time  delay  is  sufficient  to  dictate 
that  the  source  is  located  somewhere  on  that  locus  of 
points.  In  particular,  the  acoustic  source  must  be 
located  on  the  locus  of  points  that  satisfies  the  constant 
time  delay  constraint,  namely,  the  hyperbola  in 
Figure  1-2  • The  bearing  angle,  0,  that  the  hyperbolic 
asymptote  makes  with  the  baseline  is  a good  approximation 
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to  the  true  bearing  to  the  source  (relative  to  the 
midpoint  of  the  baseline)  especially  for  distant  sources. 
Thus,  by  making  a distant  point  source  (or  equivalently 
a plane  wave)  assumption  and  solving  for  the  bearing 
angle  0,  one  is  equivalently  finding  the  angle  that  the 
hyperbolic  asymptote  (or  line  of  position  (LOP))  makes 
with  the  baseline.  Familiarity  with  hyperbola  suggests 
that  the  source  need  not  be  very  distant  (relative  to 
the  sensor  separation  d)  in  order  for  the  arrival  angle 
to  be  a good  estimator  of  true  source  angle.  In  the 
estimation  problem,  the  receivers  are  attempting  to 
estimate  bearing  (or  position)  of  a source  that  is 
radiating  a signal  either  intentionally  or  unintentionally. 
During  intentional  radiation  (for  example,  a communica- 
tions system)  signal  statistics  are  selectable  within 
certain  practical  and  regulatory  limitations.  In  other 
applications,  the  signal  characteristics  are  unknown 
and  the  output  of  the  sensors  must  be  processed  without 
this  a priori  knowledge  in  order  to  estimate  time  delay 
or  equivalently  source  bearing.  In  this  thesis  it  is 
assumed  that  the  source  characteristics  are  not  under 
the  control  of  the  designer  and  at  best  the  spectral 
characteristics  of  the  signal  are  known  or  estimated. 

The  time  delay  estimation  research  piesented  in 
this  text  is  arranged  in  six  chapters  and  four  appendices. 
Because  the  estimation  of  time  delay  and  bearing  is 
intimately  related  to  the  coherence  between  two  received 
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waveforms,  an  extensive  invest i^at ion  of  coherence  is 
given  (in  Chapter  2).  New  results  on  using  coherence' 
to  provide  information  about  linear  and  nonlinear 
system  identification  are  presented  and  proved.  Among 
other  results,  Chapter  2 explicitly  shows  how  the 
signal-to-noise  ratio  (SNR)  is  a function  of  coherence. 

In  Chapter  3,  the  ML  estimate  of  time  delay 
between  two  signals  is  derived  under  jointly  stationary 
Gaussian  assumptions.  The  explicit  dependence  of  the 
time  delay  estimate  on  coherence  is  evident  in  the 
estimator  realization  in  which  the  two  time  series  are 
prefiltered  (to  accentuate  frequency  bands  according 
to  the  strength  of  the  coherence)  and  subsequently 
crosscorre  .ated.  The  time  argument  at  vihich  the 
generalized  crosscorrelation  (GCC)  function  peaks  is 
the  time  delay  estimate  (Carter  and  Knapp  (1976a)). 

The  method  of  derivation  is  akin  to  the  ML  bearing 
estimate  derived  by  MacDonald  and  Schultheiss  (1969) 
with  two  exceptions:  (1)  the  technique  here  requires 

no  plane  wave  assumption  but  finds  the  ML  estimate  of  the 
more  general  time  delay  parameter,  from  which  one  can 
e.stimate  both  the  hyperbolic  LOP  and  source  bearing, 
and,  (2)  the  derivation  here  does  not  constrain  the 
additive  noise  waveforms  at  different  sensors  to  have 
the  same  spectral  characteristics.  These  conditions 
allow  for  widely  spaced  sensors  since  the  spectral 
characteristics  of  the  noise  can  be  different  and  the 


signal  wavefront  is  not  constrained  to  be  planar. 

Having  derived  the  ML  estimate  of  time  delay, 
we  show  that  it  is  equivalent  to  the  GCC  function  with 
prefiltering  suggested  by  Hannan  and  Thomson  (1973). 
Although  the  ML  estimator  is  the  same  as  the  method 
suggested  by  Hannan  and  Thomson  (1973),  this  could 
not  have  been  accurately  predicted  ahead  of  time. 

The  Hannan  Thomson  (HT)  processor  was  obtained  as  a 
GCC  function  with  optimally  determined  weighting. 

In  related  work,  Clay.  Hinich,  and  Shaman  (1973)  arrived 
at  a less  general  ML  estimate  for  b.-  aring,  due  to  the 
assumption  that  the  signal  spectral  characteristics 
were  flat  in  the  frequency  band  of  interest.  The  results 
of  this  thesis  are  also  more  general  than  those  of 
MacDonald  and  Schultheiss  (1969)  because  there  is 
no  signal  plane  wave  assumption  and  the  noise  spectral 
characteristics  can  differ  from  sensor  to  sensor. 

When  the  received  signal  and  noise  waveforms  are 
stationary  and  Gaussian  with  known  spectral  characteristic 
it  is  shown  that  the  ML  estimate  of  time  delay  achieves 
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the  Cramer-Rao  bound.  Thus,  the  ML  estimate,  in  this 
case,  achieves  a variance  less  than  or  equal  to  that 
attained  by  any  other  means.  Two  realizations  of  the 
time  delay  estimate  are  given:  the  first,  uses  the 

GCC  function  with  appropriate  prefilters;  the  second 
appropriately  filters,  sums,  squares,  and  averages  as 
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suggested  by  Carter  and  Knapp  (1976a).  Further,  when 
the  spectral  characteristics  are  known  the  variance  of 
the  delay  estimates  is  derived  for  all  GCC  processors. 
When  the  signal  and  noise  spectral  characteristics  are 
unknown,  as  is  often  the  case  in  the  passive  bearing 
estimation  problem,  it  is  suggested  that  an  approximate 
technique  be  used,  whereby  estimates  of  the  ML  weighting 
are  inserted  in  the  place  of  the  correct  weighting. 

This  heuristic  procedure  will  converge  to  the  ML  estimate 
provided  the  weighting  is  properly  estimated.  The 
appendices  summarize  woik  in  this  area  by  Carter,  Knapp, 
and  Nuttall  (1973a)  to  estimate  the  spectral  densities 
including  coherence.  (Details  of  the  appendices  are 
discussed  later  in  the  introduction.) 

In  Chapter  4,  the  variances  of  six  proposed  time 
delay  estimates,  including  ones  suggested  by  Roth  (1971) 
and  Carter,  Nuttall,  and  Cable  (1973),  are  compared  for 
an  example  case  where  the  signal  and  noise  have 
rectangular  spectra  with  different  bandwidths.  The 
results  confirm  the  advantages  of  ML  time  delay 
estimation . 

The  estimation  formulation  is  extended,  in  Chapter 
5,  to  three  important  generalizations:  multiple  sources, 
moving  source,  and  multiple  sensors.  The  multiple 
source  problem  introduces  a new  term  in  the  award 
function  which  was  maximized  in  Chapter  3 to  obtain  a 
single  time  delay  estimate.  This  additional  term  is  the 
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information  between  two  processes.  Nettheim  (1966), 
using  results  of  Gelfand  and  Yaglom  (1959),  has 
shown  the  Shannon  (1949)  definition  of  information  to 
be  directly  related  to  the  coherence  between  two 
processes.  Thus,  as  with  the  single  time  delay 
estimation  problem,  coherence  play-?  an  important  role. 
Source  motion  significantly  complicates  tie  bearing 
estimation  problem  as  Indicated  in  section  B of  Chapter  5. 
Indeed,  unless  some  preprocessing  is  done,  the  received 
waveforms  appear  uncorrelated  despite  the  presence  of  a 
common  but  time  compressed  (or  less  generally,  Doppler 
shifted)  signal.  A method  based  on  the  ideas  of  Chapter 
3 is  suggested  for  preprocessing  the  received  waveforms 
to  remove  the  effect  of  source  motion.  The  last  section 
of  Chapter  5 extends  the  filter  and  sum  realization  for 
time  delay  estimation  to  a multiple  sensor  environment. 
Finally,  Chapter  6 is  a brief  discussion  and  summary  of 
applications  for  the  methods  of  time  delay  estimation 
and  suggestions  for  future  work. 

The  appendices  of  this  dissertation  are  provided  to 
imnlement  and  corroborate  the  theory  developed  in  Chapter  3. 
Appendix  A summarizes  two  methods  of  spectral  estimation 
given  in  Carter,  Knapp,  and  Nuttall  (1973a)  and  Carter 
and  Knapp  (1975).  Appendix  B gives  important  results 
of  the  statistical  behavior  of  the  estimates  of  the 
magnitude-squared  coherence  (MSC) , including  the 
probability  density  function  (pdf),  the  cumulative 
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distribution  function  (cdf),and  the  m-th  moment  of  the 
MSC  estimate.  A complete  discussion  of  the  bias  and  the 
variance  of  the  MSC  estimates  is  presented,  including 
a simulation  (done  by  Nuttall  and  Carter  (1976b))  that 
supports  theoretical  results  of  Haubrich  (1965)  and 
Carter,  Knapp,  and  Nuttall  (1973a)  and  refutes  past 
simulation  results  of  Benignus  (1969a).  Using  a method 
suggested  by  Benignus  (1969a),  a reduced  bias  method 
of  MSC  estimation  is  verified;  however,  it  is  discovered 
that  for  many  practical  estimation  situations  the  reduced 
bias  MSC  estimator  will  have  increased  mean  square  error 
(MSB)  when  compared  with  the  MSC  estimator  given  in 
Appendix  A.  An  example  is  given  of  erroneous  simulation 
results  (in  particular,  unexpectedly  large  bias)  when 
the  assumptions  of  the  theory  have  been  violated. 

In  the  process  of  detecting  a coherent  source 
it  is  desirable  to  establish  a threshold  above  which  a 
source  is  considered  detected.  Rules  for  establishing 
such  a threshold  are  given  (Carter  (1976))  in  order  to 
achieve  a specified  probability  of  false  alarm.  Having 
established  such  a threshold,  it  is  possible  to  determine 
the  probability  of  detecting  a coherent  source;  the 
probability  of  detection  will  depend  both  on  the 
observation  time  and  the  underlying  strength  of  the 
coherent  source.  Example  receiver  operating  character- 
istics are  plotted  for  different  observation  times  and 
coherent  source  levels. 
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Appendix  C gives  a complete  FORTRAN  IV  computer 
listing  of  a program  to  estimate  time  delay.  The 
program  was  successfully  compiled  and  run  on  both  a 
Univac  and  an  IBM  computer.  Appendix  D presents  an 
example  case  to  validate  both  the  theory  and  the  computer 
program . 

The  text,  then,  is  arranged  as  follows;  Chapter  3 
contains  the  derivation  for  the  ML  time  delay  estimator; 
because  these  results  depend  on  the  coherence  between 
two  random  processes,  we  first  demonstrate  in  Chapter  2 
what  characteristics  the  coherence  possesses.  Chapter  4 
compares  the  ML  estimator  derived  in  Chapter  3 with 
other  proposed  methods  for  estimating  time  delay. 

Chapter  5 extends  the  results  of  Chapter  3 to  three 
important  generalizations;  multiple  sources,  moving 
source. and  multiple  sensors.  Applications  and  a general 
discussion  are  presented  in  Chapter  6.  The  four 
appendices  are  all  concerned  with  experimental 
verification  of  approximate  methods  for  estimating  time 
delay  presented  in  Chapter  3. 


CHAPTER  2 


THEORY  AND  APPLICATIONS  OF  COHERENCE 


The  solution  to  the  physical  problem  of  estimating 
source  bearing  is  intimately  tied  to  the  coherence 
between  spatially  separated  passive  sensors. 

This  chapter  presents  the  definition  and  properties 
of  the  coherence  and  several  new  results  on  its  use. 

These  results  bear  both  directly  and  indirectly  on  the 
solution  to  the  optimum  delay  estimation  problem. 

2A.  Definition.  Relationship  to  Crosscorrelation,  and 

Properties 
2A1.  Definition 

The  coefficient  of  coherency  (CC)  between  two 
wide  sense  stationary  random  processes  is  the  normalized 
cross  power  spectral  density  function  defined  by  Weiner 
(1930)  as  G (f) 

---  . (2-1) 


X " 

*12 


VS*1 


(f)  G (f) 
^2  2 


where  f denotes  the  frequency  (Hz),  G (f)  is  the  cross - 

*12 

power  spectrum  between  x.(t)  and  x„(t),  and  G (f), 

1.  dt  X 

G (f)  denote  the  auto  powe.  spectra  of  x.,(t),  x„(t), 

*2  2 ^ ^ 
respectively.  Despite  some  coni  sion  in  the  literature, 

Weiner  intended  for  the  CC  to  be  complex.  This  is 


12 


13 


apparent  since  he  discusses  (p.  194,  Weiner  (1930)) 
both  the  modulus  and  the  argument  of  the  CC.  Moreover, 
in  suggesting  how  one  might  compute  the  CC,  the  modulus 
of  the  complex  numerator  is  not  indicated.  The  CC  is 
also  referred  to  as  the  complex  coherence  (Carter,  Knapp, 
and  Nuttall  (1973a)).  Many  of  the  results  which  follow 
depend  on  the  magnitude-squared  of  the  CC  (MSC).  The 
MSC  is  also  referred  to  as  the  squared  coherency 
(Jenkins  and  Watts  (1968)). 

In  order  to  simplify  the  notation  throughout 
the  thesis,  we  define 


XfX2 


(f)  = 


rx  X 


(2-2) 


When  the  two  processes  under  consideration  are  apparent, 
we  further  simplify  the  notation  by  letting 


1 C 

The  magnitude  of  the  CC  (MC)  is  denoted  by 


^x  X 


(2-3) 


The  term  "coherence"  can  imply  CC,  MC  or  MSC.  Indeed, 
variables  that  are  a function  of  the  MSC  (or  MC)  alone 
are  also  functions  of  the  CC  alone,  but  not  necessarily 
vice  versa.  While  it  seems  most  natural  mathematically 
to  refer  to  the  CC  as  the  coherence,  the  majority  of  the 
literature  refers  to  the  MSC  as  coherence. 

Since  G (t)  and  G (f)  are  real,  the  phase 
*11  *22 
of  the  C 1 denoted  by 


(2-4a) 


i 

i 
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= [=X,X3<^>] 


that  is,  the  phase  of  the  CC  is  the  same  as  the  phase  of 
the  cross  spectrum.  Later  we  will  interpret  (2-4c)  as 
the  phase  of  the  optimum  linear  filter  that  maps  x^(t) 


to  X2(t) . 


2A2.  Relation  to  Crosscorrelation 


The  CC  between  x(t)  and  y(t)  can  be  confused  with 
the  crosscorrelation  coefficient  or  normalized  cross- 
correlation function  defined  for  zero  mean  processes  by 


Pxy(T)  - 


R (t) 

_ 

H (0)  R (0)1 

xx''  y/^ 


The  normalized  crosscorrelation  is  a function  of  lag  and 
not  frequency.  Further  note  that  the  normalizing  factor 
is  the  scalar  fa  (0)  R (O)]^,  independent  of  x.  It 
is  not  a lag  dependent  normalization.  The  CC  has  an 
abscissa  dependent  type  of  normalization  (2-1). 

However,  there  are  two  models  of  filtering  that  aid  in 
interpreting  tiie  CC  as  a type  of  crosscorrelation. 

In  the  first  model,  we  are  given  x(t)  and  y(t)  as 
depicted  in  Figure  2-1,  and  we  want  to  find  the  CC 
between  x(t)  and  y(t).  If  we  prefilter  x(t)  by  the 
linear  filter  Hj^(f)  and  y(t)  by  the  linear  H2(f),  then 
(from  p.  399,  Davenport  (1970))  the  cross  spectrum 
between  the  filter  outputs  is 

G.  . (f)  = G^y(f)  H^(f)H2*(f)  . 


XiVj 


(2-6) 
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vi 


Figure  2-1 


Distinct  Linear  Filters  Inputs 

x.y  and  Outputs 
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Thus,  if  we  select 
H^(f)H2*(f)  = 

it  follows  that 


nTg  (f  )G  (f) 

xx'  yy  ’ 


G (f)  = Y (f). 


(2-7) 


Thus,  the  CC  between  x(t)  and  y(t)  can  be  obtained  by 
first  prefiltering  x(t)  by  the  realizable  whitening 
filter 

H,(f)  = (2-8) 

and  prefiltering  y(t)  with  a realizable  whitening 
filter  with  the  same  phase  as  (2-8).  Namely,  we  select 

Ho(f)  = ■ --  - . (2-9) 

V Oy/t) 

Such  filtering  ensures 

'J’v  V • ^2-10) 


That  is,  the  phase  between  input  processes  is  invariant 
to  equiphase  filtering.  Then,  to  compute  the  CC  between 
x(t)  and  y(t),  we  compute  the  cross  spectrum  between 
x^(t)  and  y^(t).  This  could  be  accomplished  by  cross- 
correlating  x^(t)  and  yj^(t)  and  taking  the  inverse 
Fourier  transform  (or  see  Appendix  A), 

In  the  second  model  used  to  understand  the  CC  we 
observe  that  for  x^(t)  and  y^^(t)  (in  Figure  2-1  ) zero 


mean 
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XiYi 


(T)  % 


00 

; G ' df 

_ on  * 


.(2-11) 


f G (f ) 

XX ' ' 

— 00 


H^Cf) 


'df  /G  (f) 

yy 


H2(f)  df 


Thus  if 


H^(f)=H2(f)  = I 


ej4*(lc^  Af  I i Af 

o , elsewhere 


(2-12) 


(2-11)  becomes  (for  small  Af) 

G (f  )e'^^^^c’^+G  (-f  ) 

T)  = L -^y  xy^ 


-j2Trf  T 
e ^ c 

jAf 


XiYi 


[' 


G (f  )2Af  • G (f  )2Af 
XX  c yy'  c 


1 


(2-13a) 


’g  (f  )ej2^Vl 

L xy'  c''  J 


- Re  L xy'  c 


G (f  )G  (f  ) 
xx'  c yy  c' 


(2-13b' 


j2Tlf^T' 


(2-13C) 


= |Y^y(f^)|^os  2Tif^(T-D  )]  . (2-13d) 

The  crosscorrelation  coefficient  at  zero  argument  is 
given  by 

%y^(0)  = Re[\y(t^)]  . (2-14) 

Thus  we  see  from  (2-14) and  (2-13d)  how  the  CC  is 
related  to  the  crosscorrelation  coefficient. 

2A3.  Properties 

The  power  spectral  density  matrix  is  positive 
semidefinite  (Jenkins  and  Watts  (1968)).  Therefore, 
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for  two  random  processes,  we  see  that 


lQ^(f)l  = 


G (f)  G (f) 


G „ (f)  G (f) 
^2^1  ’^2*2 


> 0 


(2-15a) 


For  real  processes  G (f)=G*  (f)  and  thus 

^2*1 


’^1*2 


(f)G^  ^ (f)  -I  G^  ^ (f)l  ^ > 0 . 


^1*1  ' ''2*2 


*1*2 


(2-15b) 


and 


X X ^ X (^>1 

11  2 2 12 


(2-15c) 


Further,  G (f)  and  G (f)  are  nonnegative,  real 
*1*1  *22 

functions  of  1.  When  G (f),  G (f)  are  strictly 

*11  *2*2 

positive  definite  (that  is,  when  G (f)G  (f)>0), 

"l^'l  ^^2  2 

(2-15c)  can  be  divided  through  by  G (f)G  (f) 

XiXi  XgXg 

without  changing  the  sense  of  the  inequality  thereby 
yielding 


*1*2 


(f)  < 1,  Vf 


(2-16a) 


Further,  the  magnitude -squared  of  any  complex  number  is 
gr(*ater  than  or  equal  to  zero.  Thus, 


0 1 X 1 1 • (2-16b) 

*12 

The  MSC  always  falls  between  zero  and  one.  Further, 
as  will  be  shown,  it  is  zero  if  the  processes  Xj^(t) 
and  XgCt)  are  uncorrelated;  and,  it  is  equal  to  unity 
if  there  exists  a linear  relation  between  Xj^(t)  and 
XgCt).  The  cross-power  spectrum  is  then  defined  by 
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Hannan  and  Thomson  (1973) as 


(f)  = 


= .to 

\ X^3 


JfjG. 


^2^2 


(f) 


X 


(2-17) 


This  definition  is  interesting  since  it  points  out  the 

importance  of  the  coherence.  It  should  be  noted  that  if 

Y (f)  is  undefined,  G (f)  cannot  be  computed  from 
^1^2  ^12 

(2-17)  (as,  for  example,  when  G (f)  and  G (f)  are 

^1^2  ^11 

zero).  Here  we  note  that  the  statement  C „ (f)  = 0 

’^1^2 

provides  more  information  than  the  statement 
2 

|G  V since  in  the  former  case,  both  auto- 

XfX2 

spectra  must  be  nonzero.  However,  it  may  be  more  exact 

2 

to  say  I G (f)|  is  undefined  when  no  measurement  :s 
^12 

made . 

In  order  to  define  the  MSC,  it  is  necessaiy  that 
the  numerator  and  denominator  of  that  ratio  not  be 
simultaneously  zero.  Moreover  the  MSC  will  be  undefined 
if  either  autospectra  is  zeio.  For  example,  if 
G (f)=OorG  (f)=Oit  must  be  true  from  (2-15c) 

X^Xf  XgX^ 

that  |G  „ I =0.  Hence,  it  can  be  concluded  that  if 
^12 

either  G (f)orG  (f)is  zero  over  some  frequency 
^11  ^22 

range  then  the  MSC  is  undefined  over  that  same  frequency 
range.  Further,  if  this  is  the  case,  the  power  spectral 
density  matrix  is  singular,  .\nother  property  of  the  MSC 
is  that  the  MSC  is  invariant  under  linear  transformations. 
If  :;(t)  is  filtered  by  H^(f)  and  y(t)  is  filtered  by 
H2(f)  as  depicted  in  Figure  2-1,  then 
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(f) 


= G_(f)G.„(f) 


XX 


yy 


(2-18a) 


2=  C^y(f)(2-18b) 


Gyy(f)|H2(f) 

Thus  provided  |H^( f ) ] ^ |H2( f ) ] ^ f 0 


^x^y^(f)  = C^y(f)  . (2-19) 

That  is,  the  MSC  is  the  same  between  x and  y as 
between  the  filtered  versions  x^  and  y^. 

2B.  Uses  of  Coherence  Function 
The  MSC  function  for  the  zero-mean,  wi de-sense 
stationary  processes  x(t)  and  y(t)  is  useful  in  several 
ways,  which  will  be  proved  in  the  following  sections. 
First,  for  two  independent  processes,  the  MSC  function 
is  zero.  Second,  the  MSC  measures  the  degree  of  system 
linearity.  Third,  under  the  assumptions  to  be 
presented,  the  MSC  function  serves  as  a SNR  measure. 

2B1 . Measure  of  Correlation 
THEOREM  2-1;  If  two  zero-mean  stationary  processes 
x(t)  and  y(t)  are  independent,  they  are  also  uncorrelated 
and  orthogonal ; 

= E(x(t)y(t-T)l  = Elx(t)]  E[y(t-T))  = 0,(2-20a) 

Ay 


xy 


00 

(f)  = f R^„(T)e*'^^''^^dT  = 0 , 


xy 


(2-20b) 


and  the  MSC 

C (f)  = 0 , Vf 
xy' 

provided  G^^(  f)Gyy(f)  fO. 


(2-20c) 


1 


I 

I 

r 


21 


Hence,  if  the  two  processes  are  independent  (or 
uncorrelated)  with  zero  mean,  the  MSC  between  them  is 
zero . 

DISCUSSION  OF  THEOREM  2-1;  Note  that  jointly  Gaussian 
random  processes  that  are  uncorrelated  (incoherent) 
are  also  independent.  However,  it  is  possible  for  two 
processes  to  be  highly  dependent  yet  uncorrelated 
(incoherent),  even  if  one  of  the  two  processes  is 
Gaussian.  Although  one  may  be  led  by  physical 
considerations  to  presume  processes  are  independent  and 
hence  uncorrelated,  in  practice,  it  is  easier  to  show 
processes  are  uncorrelated  than  independent . Note  that 
if  C (f)  = 0,  Vf,  it  follows  that  Re(Y^„(f))  = 

xy  xy 

Im(Y  (f)]  = 0 = G (f),  Vf  and  thence  it  follows  that 
xy  xy 

R (t)  = 0,  yT.  Hence,  we  see  that  if  two  processes 
xy 

are  incoherent,  then  they  are  also  uncorrelated.  However, 

as  stated  earlier,  being  incoherent  does  not  necessarily 

imply  being  independent.  For  example,  suppose 

y(t)  = n(x)  and  x(t)  is  a zero  mean  stationary  random 

2 

Gaussian  process  with  variance  o and  first  order 
probability  density  motion  (pdf), 

p(„  . — 1 ,■ 

yj  2Tro^ 

then  from  Nuttall  (1958)  and  Carter  and  Knapp  (1975) 

Rxy(0  » K R^^(t),  (2-22) 


where 
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K -4?  r n(x)x  — ^ e“^  dx  . (2-23) 

Thero  f’orc' , for  ev(!n  non  l inoiirltles,  K=0  and  R (x)=0. 

xy  ' 

Hence  G (f)=0  and  C (f)=0.  Thus,  It  Is  simple  to  derive 
X y xy 

a process  y(t)  which  is  completely  dependent  on  x(t) 
but  which  is  uncorrelated  with  it.  Hence,  the  converse 
of  theorem  (2-1)  does  not  hold  and  coherence  does  not 
provide  information  on  dependence  or  independence  but 
only  on  second  order  measures  like  correlation. 

2B2.  Measure  of  System  Linearity 
The  MSC  function  can  be  used  to  measure  system 
linearity.  In  Figure  2-2  consider  the  linear  system 
with  input  x(t),  impulse  response  h(x),  and  output  y(t). 
The  output  y(t)  is  expressed  by  the  convolution  integral 
00 

y(t)  = f h(x)x(t-x)  dx  , (2-24a) 

.00 

or 

y(t)  = h(t)  • x(t)  , (2-24b) 

where  O denotes  convolution. 

In  the  Fourier  domain  the  convolution  is  the  multi- 
plication (Oppenheim  and  Schafer  (1975)) 

Y(f)  = H(f)X(f)  , (2-25) 

where  X,  H,  and  Y are  Fourier  transforms  of  x,  h and 
y,  respectively. 
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Figure  2-2  Linear  System  with  Impulse  Respone  h(x) 


24 


theorem  2-2: 

If  a system  is  linear  then 

i (b  ( f ) w 
Y (f ) = e'^^xy^  , V f 

'xy 

and  hence 

C (f)  = 1.  vf. 
xy 

PROOF  OF  THEOREM  2-2; 

For  linear  systems, 


(2-26) 


(2-27) 


or  when  G (f)  f 0 

A A o 


**  G (t)G;„ (() 

G (f)  ^ °xx'*>- 

yy  o2^(f) 


(2-28) 


(2-29) 


substituting  Gyj,(f)  into  the  basic  definition  of  CC, 


G (f) 

Y V— ^ 

'xy  ■ 


|G^f)|  e^^xy^^^ 


j4)  (f)- 

= e^'^xy' 


(2-30a) 


(2-30b) 


Further , 

Vf)  = eos^  I »^y(f)ltnln^lV'" 

discussion  of  theorem  2-2:  This  theorem  Is  related  to 

work  of  Koopmans  (1964),  Jenkins  and  Watts  (1968), 

Otnes  and  Enochson  (1972),  Carter,  Knapp  and  Nuttall 
(1973a),  Koopmans  (1974),  BrllUnger  (1975),  an 
Halvorsen  and  Bendat  (1975).  This  theorem,  experience, 
and  certain  Intuition  lead  one  to  believe  the  converse 
of  the  theorem  should  also  be  true.  To  date  no  proof 
has  been  presented  for  the  converse.  Notably,  It  m 
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the  converse  which  would  play  a most  important  role  in 
the  applications  area.  This  is  because  one  is  seldom 
given  a linear  system  and  asked  to  measure  MSC.  Rather, 
one  is  given  an  unidentified  system  and  asked: 

"Is  it  linear?".  In  the  past,  if  the  MSC  was  unity,  one 
had  a "hunch"  that  this  was  true  but  no  rigorous  proof 
existed  to  assert  this  truth.  The  following  theorem 
acts  to  clarify  this  dilemma  and  indeed  show  what  can 
and  what  cannot  be  said  about  linearity  when  the  MSC 
is  unity. 

The  strongest  theorem  which  can  be  proved  in 
this  regard  is  as  follows: 

THEOREM  2-3:  If  C^y(f)=l,Vf,  then  with  probability  one 

there  exists  an  optimum  filter  with  unique  transfer 
function  H^(f)  that  can  act  on  the  input,  x(t),  to  an 
unidentified  system  to  achieve  output  y^Ct)  exactly 
equal  in  every  detail  to  the  output  y(t)  of  the 
unidentified  system,  (that  is,  yQ(t)=y(t),  with 
probability  one).  Moreover,  the  phase  of  the  filter 
Arg  H^(f)  = = Arg  Yy^(f)  • 

In  order  to  prove  theorem  2-3,  it  is  necessary  to  introduce 
and  prove  a lemma. 

LEMMA  2-1:  If  G^g(f)  is  the  power  spectrum  of  an  ergodic 

random  process  with  member  function  e(t)  and  if  G^^(f)=^0, 
Vf,  then  e(t)  equals  zero  with  probability  one  for  all  t. 
PROOF  OF  LEMMA  2-1:  From  p.  150,  Papoulis  (1965),  the 
Chebycheff  (or  Tchebvcheff)  inequality  is 
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2 

Prob  { |e(t)-E  [e(t)]  |<e}  ^ 1 ^ (2-32) 

n 

where  e>0  can  be  made  arbitrarily  small  and  is  the 
variance  or  power  of  e(t).  The  autocorrelation  function 
of  e(t)  is 

^ee^^^  = df,  (2-33) 

but  G (f)=0,  Vf  so  that  R (t)=0,  Vt.  In  particular 

V.T  c C C 

Ree(O)  = f [e^(t)]=0  = + E^[e(t)]  . (2-34) 

n 

Hence  a =0  and  E[e(t)]=0.  Alternatively  note  that  the 
value  of  the  tviils  of  the  autocorrelation  is  related  to 
the  mean  value  of  the  function.  Specifically,  (p.  333, 
Papoulis  (1965)) 


“ E,2[e(t)]  . (2-35) 

So  since  R (t)=0,  Vt 
ee  ’ 

lira  Ree(T)  * 0 , (2-36) 

^-►00 

it  follows  that 

E^[e(t)]  = 0,  (2-37) 

and  thus  that 

E [e(t)]  - 0 . (2-38) 

2 

Therefore,  the  Chebycheff  inequality  with  c -0  and 
E[e(t)]«  0 is 

Prob  [|e(t)!<e]>  1 , (2-39a) 
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but  0 _<  Prob[^  J _<  1 so  that 

Prob  [|e(t)|<e]  = 1 ; (2-39b) 

that  is,  the  probability  that  |e(t)|  is  less  than  somo 
arbitrarily  small  value  is  one.  Statistically,  we  say 
that  this  event  happens  "with  probability  one"  or  we 
say  that  it  happens  "almost  surely."  So  when  the  power 
spectrum  G^^(f)  of  this  random  process  is  zero  for  all 
frequencies,  then  e(t)=0  with  probability  one. 

DISCUSSION  OF  LEMMA  2-1: 

The  interpretation  of  the  results  can  be 
misleading  for  transients  (nonstationary  processes). 

For  example,  considei'  (see,  for  example,  p.  93  of  Lee 
(I960)), 


1 i m 1 2 «> 

2T  e^t)dt  = Rg^(O)  = /„  Ggg(i)  df. 


(2-40) 


Now  clearly  there  exists  e(t)^  0 such  that 


^ ^ e^(t)dt  = 0 . (2-41) 

For  example,  if  a finite  energy  pulse  lasts  only  a few 
seconds,  then  the  power  (or  "average"  energy)  in  such 
a nonrepetitive  pulse  is  zero.  This  is  because 


lim  f T 

T->-oo  _T 

equals  some  nonzero  constant  energy  but 


lim  _L  r'^ 
T-^»  2T  " 


e^(t)dt 


equals  zero;  hence,  the  power  is  zero.  Transient 
situations  of  this  type  are  disallowed  by  the  erpodicity 
constraint  vrhich  requires  stationarity . (Ergodic 
processes  are  stationary  but  not  necessarily  vice  versa.) 
The  essence  of  the  proof  then  is  that  for  ergodic  random 
processes  almost  surely  e(t)=0  in  that  frequency  band 
where  G^g(f)=0.  This  is  a reasonable  practical  assumption 
however,  it  should  not  be  overlooked  that  there  exists  a 
nonstationary  class  of  processes  for  which  the  proof  of 
LEMMA  2-1  does  not  apply.  We  now  proceed  with  the  proof 
of  theorem  2-3. 

PROOF  OF  THEOREM  2-3:  It  is  instructive  to  visualize 

the  proof  as  attempting  to  select  an  optimum  filter  such 
that  the  minimum  mean  squared  error  (MMSE)  is  achieved, 
where  the  error  e(t)  is  defined  as  e(t)*y(t)-y^vt) , 
as  shown  in  Figtre  ;':-3. 

The  solution  will  make  no  presumptions  on  the 
origin  (source)  of  y(t).  It  is  useful,  however,  to 
envision  y(t)  as  the  stationary  output  of  an  unidentified 
system  as  depicted  in  Figure  2-4;  such  a model  is  a 
special  case  of  Figure  2-3,  but  is  perhaps  a more  common 
system  identification  problem.  Whether  the  error  signal 
e(t)  is  generated  from  Figure  2-3  or  Figure  2-4,  it 
follows  that  the  total  power  is  given  by 


1 im  m f 
-T 


/2 

/2 


e^( t )dt 


= / 


G ^(f) 

ee  ' 


df 


^00 


(2-42) 
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Filtering  x(t)  to  Match  Any  Desired 
Signal  y(t) 


i 

! 
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n(t) 


Figure  2-4  Model  of  Error  Resulting  from  Linear 
Approximation  of  Nonlinearity 
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All  power  spectra  have  the  property  that  they  are  non- 
negative. The  implication  is  that  in  integrating  over 
the  interval  there  will  be  no  portions  of 

G^g(f)  that  will  "cancel"  other  portions.  Solving  for 

G (f),  it  can  be  shown  that 
6“ 


®ee^^^=^y^^^  G^^(f ) |H(f ) r-H(f  )G^y(f ) 

-H*(f)G*y(f), 


(2-43) 


which  can  be  written,  as  done  by  Carter  and  Knapp  (1975), 


2 r 1 

G (f)=G  (f)|H(f)-  vvxl  +G  (f)  1-C  (f) 

ee  XX  G (f)‘  yy  L xy  -I 


(2-44) 


Since 

G (f)>0,  G (f)>0,  andO<C  (f)<l, 
xx'  — ’ yy'  ' — ' — xy'  - ’ 

it  is  necessary  to  minimize 

'«<'>- -5^)1  • 


which  is  done  by  selecting  the  optimum  linear  filter 


G (f) 


G (f)  .. 

G (f) 

XX  ' 


(2-45) 


The  optimum  filter  is  a Wiener  filter  and  is  discussed 
in  texts  by  Lee  (1960)  and  Van  Trees  (1968).  The  Fourier 
transform  of  (2-45)  is  the  impulse  response 


h (i)  = r H^(f )e'^^^^^df 
o — ® o 


(2-46) 


In  general,  h^(x)  will  be  a nonzero  for  t<0;  hence, 
the  system  will  be  nonrealizable.  Various  methods  can 
be  applied  to  obtain  the  optimum  realizable  linear  filter; 
although  they  are  beyond  the  scope  of  this  thesis,  they 
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are  discussed  in  standard  texts  such  as  Lee  (1960)  or 
Van  Trees  (1968). 

From  (2-45)  the  cross  spectrum  between  x(t)  and 

y(t), 

<2-«) 

but  since  x(t)  excites  a linear  filter  H^(f)  to  produce 
output  y^(t),  it  also  follows  that 


G (f)  = H (f)G  (f) 
y x'  ''  o^  xx^  ^ 
o 


Substituting  (2-48)  into  (2-47)  yields 


G (f)  = G (f) 

yx 


Since  y(t)=e(t)  + yQ(t)  , 

Ry^(T)  = E{  [e(t)+y^(t)]x(t-T) 


= R (T)  + R (t) 
ex'  ' y^x'  ' 


(2-48) 


(2-49) 


(2-50a) 

(2-50b) 


But  by  taking  the  Fourier  transform  of  both  sides  of 
(2-49) 

R (t)  = R „(t)  . (2-51) 

yx  ’ 


Hence,  from  (2-51)  and  (2-50) 


( 3-52 ) 


that  is,  the  error  is  uncorrelated  with  the  input  x(t). 
This  is  an  interesting  property  of  the  error  signal 
in  it's  own  right.  When  x^(t)  is  linearly  filtered 
by  H^(f)  to  yield  y^(t)  for  i=l,2,  the  cross-power 
spectrum  of  the  filter  outputs  is  given  by  Davenport 
(1970)  as 


(2-53) 
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Hence  in  the  special  case  where  x^(t)=x(t),  X2(t)=e(t), 
H^(f)=H^(f)  and  H2(f)=l,  it  follows  that 

• (2-54) 

So  if  the  error  is  uncorrelated  with  x(t)  (that  is,  if 

G (f)=0),  then  it  must  be  true  that  G (f)=0  (that  is, 
xe  y^e 

the  error  is  uncorrelated  with  the  output  of  the  optimum 
filter).  The  waveform  x(t)  being  uncorrelated  with  e(t) 
implies  that  e(t)  is  also  uncorrelated  with  y^(t). 

Further , 


Rgy(x)  = E[e(t)y(t-x)]^  (2-55a) 

but  y( t )=e(t )+y^(t ) so  that 

Rey(x)  = E{e(t)[e(t-x)  + yQ(t-x)]}  (2-55b) 

= Ree(^^  ^ev  ' (2-55c) 

Recognizing  that  R (t)=0  and  taking  the  Fourier 
transform  of  both  sides  of  (2-55)  yields 

Gey(f)  = G^^(f)  . (2-56) 

The  selection  of  the  optimum  H(f)  forces  (2-44)  to 
become 


G (f ) = G (f ) fl  - C (f  )1 
ee'  yy'  [ xy'  ' j 


(2-57) 


When  C (f)=l,  clearly  (from  2-57)  G (f)=0,  and  thus 
xy  0© 

(from  LEMMA  2-1)  e(t)=0  with  probability  one,  but 

y(t)  = y^(t)  + e(t),  (2-58) 

so  that  almost  surely, 

y(t)  = yQ(t)  . 


(2-59) 
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Thus,  with  probability  one,  the  linear  filter 

G (f)  , 

”o^^^  = “v  (2-60) 

xx^  ' 

will  operate  on  x(t)  to  achieve  yQ(t)=y(t).  If  the 
optimum  output  yQ(t)=x(t)  © h^(t)  then  by  the  Fourier 
transform  relation 

Y^(f)  = X(f)H^(f)  . (2-61) 

The  Fourier  transform  is  a one  for  one  reversible 

transformation  so  that  a unique  x(t),  y(t)  implies  a 

unique  X(f),  Y(f),  but  then 

Y (f) 

u ( -t  \ = Q 

o^  ''  X(f)  (2-62) 

must  be  unique.  This  completes  the  proof  of  theorem  2-3. 
DISCUSSION  OF  THEOREM  2-3: 

Unique  transfer  functions  do  not  identify 
unique  systems.  Indeed,  nothing  is  known  about  the 
internal  structure  of  the  unidentified  system.  Further, 
the  fact  that  the  system  can  be  modeled  by  a linear 
system  H^(f)  such  that  when  both  (system  and  model)  are 
stimulated  by  an  excitation  x(t)  they  yield  identical 
output  y(t)  does  not  prove  that  the  system  is  linear 
over  all  inputs.  There  may  indeed  be  unobservable 
nonlinearities  in  the  unidentified  system.  For  example, 
suppose  the  excitation  x(t)  is  stationary  but  with  first 
order  pdf  such  that  -A  £|x(t)|j<A.  This  implies  that 
x(t)  never  excites  the  unidentified  system  for  amplitudes 
greater  than  A;  hence,  no  conclusions  can  be  drawn 
about  the  linearity  of  the  system  over  all  inputs. 
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Many  "real  world"  systems  are  linear  over  a certain 
range  of  amplitudes  and  then  saturate  above  that 
amplitude  as  in  the  case  of  anal(>g  computers 
(Kochenburger  (1972)).  As  another  example,  consider 
any  stationary  x(t).  The  stationary  excitation  has 
only  one  invarianc  power  spectrum  Systems 

which  appear  linear  for  some  G (f)  but  which  are 
clearly  nonlinear  for  different  input  statistics  are 
simple  to  envision.  If  a system  is  nonlinear  but  the 
nonlinearity  is  not  excited  (or  more  generally,  not 
observed),  then  the  system  will  appear  linear  and  the 
measurement  of  the  MSC  will  equal  unity.  In  essence 
then,  the  class  of  nonlinear  functions  is  so  larg<'  that 
based  on  a single  excitation  (even  white  Gaussian  noise) 
it  is  impossible  to  claim,  without  qualification,  that 
a system  is  "linear"  simply  because  the  MSC  is  unity,  for 
all  probed  frequencies.  Another  type  of  nonlinear  system 
is  one  in  which  the  MSC  is  observed  to  be  unity  in  some 
frequency  bands  and  not  unity  in  other  bands.  Thus 
yQ(t)  f y(t),  unless  those  frequency  bands  which  cannot 
be  accounted  for  by  linear  processing  are  removed.  More 
precisely,  if  C (f)  = 1 in  the  frequency  band  (f.,,f„) 
then  with  probability  one  there  exists  an  optimum  linear 
filter  with  unique  transfer  function  H^(f)  that  can 
act  on  x(t)  to  achieve  optimum  output  V^^Ct)  where 
y^j(t)  = y(t)  • hj(t)  and  hj(t)  is  the  impulse  i'<^sponse  of 
an  ideal  zero  phase,  unity  gain  "box  car"  filter  that 
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passes  only  those  frequencies  in  the  band. 

The  whole  problem  of  nonlinear  systems  can  be  treated 
by  considering  what  proportion  of  a system  output  can 
b(!  attributed  to  a linear  operation  and  what  proportion 
is  due  to  a residual  or  nonlinear  operation.  In 
general,  the  power  spectrum  of  the  optimum  output 


G (f)  = IH  (f)i  G (f) 
^0^0  ' o'  xx'  ^ 


(2-63) 


or  substituting  (2-1),  (2-2)  and  (2-45)  into  (2-63)  yields 

G (f)  = G (f)C  (f)  . (2-64) 

^0^0  xy'  " 

This  important  result  (Carter  and  Knapp  (1975))  can  be 
rewritten  as 

(2-65) 


^ y 

c (f)  -210 

G.„(f) 


yy 


The  implication  is  that  the  MSC  measures  the  portion 
or  amount  of  power  (G  (f))  which  can  be  obtained  through 

j «y 

optimal  linear  filtering  (in  the  MMSE  sense)  of  x(t). 
Moreover,  it  is  always  true  (provided  C^y(f)  is  defined) 
that 


O (f)  = C„„(f)G„„(f) 


yy 


xy  yy 


+ fi-c  (f)lG  (f) 


yy 


(2-66) 


Substituting  from  (2-64)  and  (2-57)  into  (2-66) 


yields 

G (f)  = G , (f)  + G (f)  , (2-67) 

yy  ' ee'  ' ' 

which  implies  the  power  spectrum  of  the  output  of  a 
system  is  comprised  only  of  the  sum  of  an  error  spectrum 
and  an  optimum  spectrum.  This  same  result  can  be 


noticed  from 
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" O-'o 


o 

^y  (-t)«0,vt  so  that 

° o 

' "eet')  + R y (t)  . 

0^0 


(2-69) 


computing  the  Fourtet  transto™  of  (2-a9,  venftes 
(2-67) . 

Just  as  the  MSC  measured  what  portion  o,  G (t, 
uould  he  Obtained  hy  (optical)  linear  mterlng,  on" 
«nus  MSC  is  a measure  of  the  portion  of  output  power 
ue  to  an  uncorrelated  error  component;  that  is. 


G (f ) 
66'  ' 


(2-70) 


Thus,  it  follows  that  the  ratio  of 

tio  of  the  optimum  linear 

power  to  the  nonlinear  or  error  power  is 


G (f ) 
o'*^o 
G (7) 

ee'^  ' 


C il) 
xy^  ' 

1 - C (f) 
xy  ^ ' 


(2-71) 


(This  ratio  win  he  Important  in  the  estimation  of  time 
delay.  ) 

Rcr  practical  nonlinear  systems,  the  Identification  of 
the  optimum  linear  component  is  not  always  obvious. 

For  example,  m ;he  system  without  noise  described  by 
F(t)-(t)fb  x(t),  the  optimal  linear  part.s^bx(t, 
To  Clarify  this  point,  it  will  be  demonstrated  that  for 
a limited  class  of  inputs  and  a limited  class  of  non- 
linearities,  analytic  expressions  for  the  optimal  linear 
part  can  be  obtained.  This  offers  interesting  insight 
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into  both  the  general  sys.'rm  identification  problem  and 
the  coherence  interpretation  problem.  First,  the 
nonlinearity  is  constrained  to  have  no  memory  and  no 
noise,  that  is,y=n(x).  Second,  the  input  processes  are 
constrained  to  be  separable  in  the  sense  defined  by 
Nuttall  (1958).  A separable  process  with  second-order 
pdf  p(x^,X2;x)  and  mean  p is  defined  as  one  for  which 
the  integral  /”(x£p)prXj^,X2;T)dx^  separates  into  the 
product  of  a function  of  X2  alone  and  a function  of 
T alone.  For  example,  it  can  be  shown  that  a Gaussian 
process  possesses  these  properties  and,  hence,  is  a 
separable  process. 

Under  the  no-memory  nonlinearity  and  separable 
process  constraints,  it  has  been  proved  by  Nuttall  (1958) 
the  crosscorrelation  between  x(t)  and  y(t)  at  delay  x is 
given  by 

R (X)  = K • R (X)  . (2-72a) 

yx  XX  ’ 

where 

1 

K = / n(x)(x-u)p(x)dx  , (2-72b) 

0^  -00 

p(x)  is  the  first-order  pdf  of  x(t),  n(x)  is  a complete 

description  of  the  no-memory  nonlinear  function,  and 
2 

0 ' is  the  variance  of  x(t).  Notice  that  the  constant  K 
does  not  depend  on  frequency  or  delay  but  only  on  the 
first-order  pdf  and  the  nonlinearity.  It  follows  directly 
from  (2-72a)  that,  for  no-memory  nonlinearitios  excited 
by  separable  processes. 
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Y (f)  = Y (f)  = K 
'yx  xy 


VG  ( 
_«L 
G ( 
yy 


I) 

n 


(2-73> 


Comparison  ol’  (2-73)  with  (2-45)  and  (2-1)  shows  l.haL 
tlu!  constant  K is  the  optimum  linear  filter  in  the; 
MMSE  sense. 

As  an  example,  suppose  x(t)  has  a Gaussian 
2 

zero-mean,  a—  variance  pdf;  then 


K = ~ / n(x)x  — — e dx. 


0^  -« 


xT 


(2-74) 


2Tro' 


Whenever  the  pdf  is  even  and  n(x)  is  an  even  function, 
K=0  so  that  the  coherence  is  zero.  However,  when  n(x) 
is  an  odd  function,  K does  not  necessarily  equal  zero 
oven  though  the  unidentified  system  is  nonlinear.  For 

3 

example,  when  n(x)=x  it)+bx(t),  application  of  (2-74) 

2 

yields  K=3o  +b.  Therefore,  the  optimal  linear  part  of 

x^(t)+bx(t)  is  not  bx(t)  but  rather  y^( t )=(b+3o^)x( t ) 

2 

for  a zero  mean  Gaussian  process  with  variance  of  a . 

For  b=0,  it  follows  that  K ^ 0 and  C (f)  ^ 0 provided 

xy 

G (f)  ^ 0.  However,  if  b=-3o^,  then  K=0  and  C (f)=0. 
Thus,  the  MSC  may  still  be  zero  even  though  the  non- 
linearity is  not  even.  A computer  simulation  of  the 
exampJe  with  o = V2  conducted,  and  the 

results  verified  the  theory  (Carter  and  Knapp  (1975)). 
This  result  can  be  independently  verified  by  calculating 

Rjjy(T)  = E{x(t)^x^(t-T)+bx(t-x)]  } , which  for  Gaussian 
2 

processes  is  3o  R (i)+bR  (x).  Therefore,  C (f)=0  if 

XX  XX 

2 

b=-3o  , and  there  is  no  power  in  the  optimum  linear  part 
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3 2 

of  the  nonlinearity  n(x)  = x (t)  - 3a  x(t). 

Parenthetically,  we  note  that  another  approach 
to  this  problem  is  to  expand  the  no-memory  nonlinearity 
T|  as  an  Infinite  series  of  orthogonal  polynomials, 
spt'c  i [ j (‘,al  1 y , 

on 

y(t)  = n[x(t)]  = T.  a^H^  [x(t)]  , (2-75a) 

n=0  *^n 

where  the  H (x)  are  e Hermite  polynomials  (see,  for 
^'n 

example,  p.  xxxv,  Gradshteyn  and  Ryzhik  (1965) ) 

H (x)=l,  H (x)=x,  H (x)=x^-l,  H (x)=x^-3x 
e ^ ’ e,  ’ e„ 

and  in  general 

H (x)=xH  (x)-nH  (x)  . (2-75b) 

^n+1  ‘^'n  %-l 

Then,  (he  erosscorrelat ion  between  x and  y is  givt^n  by 

on 

(i)  = ):  aj^E{x(t)H^.  [x(t-c)]}  . (2-7G) 

■ n-0  n 

The  advantage  to  this  method  is  (.hat,  if  the  family  of 
cor re  la t j ons 

Rxh  " E{x(t)ri^  [x(t-T)]},  n=l,2,...  (2-77) 

n 

had  been  computed  once,  orthogonal  expansion  of  n(x) 
makes  r ) immediately  apparent  by  a simple  weighted 

summat ion . 

It  is  perhaps  germane  to  clarify  the  significance 

of  knowing  that  the  MSC  is  unity.  Just  as  C ^(f)=l  for 

xy 

all  f ensured  that  there  was  some  linear  filter  that 
mapped  x(t)  into  y^(t)=y(t)  exactly,  there  also  exists 


a linear  filter  which  maps  y(t)  into  x(t)  exactly.  That 
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is,  since  \G^  (f)|^  = |G  C (f)  = C (f)  and 

conclusions  drawn  with  regard  to  x(t)  and  y(t)  have 
an  analogous  relation  between  y(t)  and  x(t).  Thus, 
even  though  one  cannot  make  unqualified  statements 
about  the  unidentified  system,  there  certainly  exists 
a total  detailed  knowledge  of  its  output  for  a given 
input  and  therefore,  all  of  its  output  statistics  wiien 
the  MSC  is  unity  and  the  input  remains  unchanged.  All 
this  is  accomplished  through  the  utilization  of  a 
linear  (though  not  necessarily  realizable)  model. 

2B3.  Measure  of  Signal-to-Noi so  Ratio 

The  coherence  can  be  used  for  determining  SNR 
as  will  be  discussed  in  this  Jrection.  The  results  of 
this  section  are  of  interest  from  two  points  of  view. 

First,  the  SNR  is  a fundamental  concern  in  the  basic- 
passive  detection  problem  and  parameter  estimation  problem, 
and  second  the  results  of  this  section  will  aid  in  the 
interpretation  of  optimum  delay  estimation  and  variaiuo 
of  the  estimate  of  coherence  phase.  Hc'iicc',  while  Ltu-se 
results  can  be  derived  independent  of  the  time  delay 
estimation  problem,  they  will  form  an  important  rcjle  in 
the  understanding  of  how  to  estimate  time  delay  or 
source  bearing. 

When  x(t)  is  linearly  filtered  to  yield  output 
y(t)  and  the  output  is  corrupted  by  uncorrelated  additive 
noise,  as  depicted  in  Figure  2-5,  then  the  noise  power 


spectrum  is 
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G (f)  = G (f)  [l  - C (f)l 
nn'  yy''  [_  ^xy^  J 


(2-78) 


This  is  an  intuitively  satisfying  result  since  the  MSC 


is  unity  if  there  is  no  noise,  whereas  thp  MSC  IG  zjCI'O 


when  the  output  is  all  noise.  For  linear  systems, 


additive  noise  uncorrelated  with  the  input  reduces  the 


MSC  according  to  the  ratio  of  G (f)  to  G (f). 

° nn'  ' yy  ' 


Measurement  of  is  useful  not  only  in  the  image 


processing  problem  discussed  by  Cannon  (1974)  but  also 


in  studying  the  gross  effects  of  digital  filtering  when 


viewed  as  a perfect  filter  plus  additive  noise  (James 


(1975)  and  Weinstein  and  Oppenheim  (1969)).  These 


methods  can  also  be  applied  to  studying  special  problems 


such  as  fast  Fourier  transform  (FFT)  noise  (Ferrie  and 
Nut  tall  (1971)  and  Rabiner  and  Rader  (1972)). 


The  power  spectrum  from  the  output  of  an  arbitrary 


system  can  always  be  viewed  in  terms  of  its  two  components 


G (f)C  (f)  and  G (f)  1-C  (f)|  regardless  of  how 

yy  ^y  yy  ^y  j 


G (f)  is  produced  (as  long  as  C (f)  is  defined).  It  is 

yy  ^y 


interesting  to  note  that  the  ratio  of  these  components 


S y 

^ r\^  t^\ 


G (f ) 
zz^  ' 


C (f ) 
xy" 


(2-79) 


G (f ) 
ee 


nn 


1-C  (f) 

xy 


can  be  considered  as  either 


♦•Kci  1-1*^ 


nonlinear  ratio,  depending  on  the  application, 


For  situations  like  those  shown  in  Figure  2-5, 


the  coherence  measures  what  proportion  of  an  unidentified 


system  output  is  "linear."  Through  the  use  of  (2-79), 


the  M3C  provides  a comparison  of  the  proportion  of  system 
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power  that  is  linear  with  the  proportion  that  is 
nonlinear  in  exactly  the  same  way  in  which  the  SNR 
was  measured  For  the  output  of  n linear  system  corrupted 
by  additive  noise.  However,  in  other  system 
configurations,  such  as  that  shown  in  Figure  2-6,  where 
noise  and  signal  have  a different  interpretation, 
relation  (2-79)  will  not  be  useful.  Figure  2-6  is  of 
interest  to  the  sonar  community  since  it  is  analogous 
to  the  physical  situation  in  which  signal  s(t)  from  an 
acoustic  source  is  received  at  two  geographically 
separated  sensors.  Each  observed  signal  is  corrupted 
by  additive  stationary  noise  and  is  linearly  filtered. 
When  n^(t)  and  n2(t)  are  uncorrelated  but  have  the  same 
power  spectra  G (f),  the  SNR,  G (f)/G  (f)  is  readily 

shown  to  be 


which  differs  from  (2-79).  (Note  from  (2-19)  that 
C (f)=C  (f).)  Ironically  it  will  turn  out  to  be 

(2-79)  and  not  (2-80)  which  is  critical  to  our  problem. 
In  cases  where  each  transmission  path  attenuates  the 
source  signal  differently,  the  model  must  be  changed 
to  reflect  an  attenuation  in  one  channel.  Unless 
simplifying  assumptions  are  employed,  the  net  result 
is  that  ^ ) cannot  be  determined  from  ^^^(f) 

unless  attenuation  in  each  path  is  known.  (See 
section  4 of  appendix  B. ) 


(t) 


Model  of  Directional  Signal  Corrupted  with  Additive  Noise  and 
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More  generally,  the  source  is  transmitted 
through  two  ocean  medium  operators  and  H2(f) 

as  shown  in  Figure  2-7,  which  can  attenuate  the  signal 
differently  at  different  frequencies.  For  illustrative 
purposes,  we  assume  that  the  ocean  medium  operators  are 
linear  time  invariant  filters.  Thus  s^(t)  and  S2(t) 
are  the  outputs  of  filters  H^(f)  and  H2(f),  respectively, 
which  have  been  excited  by  source  s(t).  This  model  of 
linear  filters  and  noise  is  mathematically  tractable 
and  has  been  proposed  before,  as  for  example,  on 
p.  389  of  Whalen  (1971).  (More  sophisticated  models  are 
given  by  Kennedy  (1969)-)  When  the  noise  n^(t)  is 
uncorrelated  with  the  signal  s^(t),  the  power  spectral 
density  at  the  output  of  the  i-th  sensor  is  given  by 

1*1,2  (2-81a) 

= ^ (f)  + „ (f),  i=l,2  . (2-81b) 

®i®i  "i"i 

Further,  the  ratio  of  the  power  at  the  output  of  the 
filter  to  the  corruptive  noise  power  depends  on  the  MSC 
between  the  source  and  the  sensor.  Specifically,  from 
equation  (8)  of  Carter,  Knapp  and  Nuttall  (1973a)  or 


(2-79), 


G ^ (f ) 
s.s.^ 

G „ (f ) 
"i"i 


sx. 


1 - 


(Note  that  when  |H^(f)|=/=l, 


(f) 

C~Tf7  ’ ■ (2-82) 

i 

(2-82)  does  not  measure  the 


ratio  of  source  to  noise  power.)  The  coherence  between 


K Two  Linear  Time  Invariant 
Observed  in  the  Presence  of 
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x^(t)  and  X2(t)  in  Figure  2-7  when  and  n^^Ct)  and 
ngCt)  are  uncorrelated  Is  given  by 


Yx  X = 


{\^i 


(.1)0  „ (f) 
^2  2 


(2-83) 


In  order  to  relate  this  result  to  the  coherence 
between  the  source  and  each  sensor,  note  that. 


G (f)H.(f) 
ss  1 


^ G„„(f)G„  „ (f) 


, i = l,2 


(2-84) 


ss 


so  that 


X " ^sx  ^^^^sx  • 


(2-85) 


Taking  the  magnitude-squared  yields 


^x  x = ^sx  ^^>^sx  • 


(2-86) 


Thus,  when  a source  drives  two  linear  time  invariant 

filters  whose  output  is  observed  in  the  presence  of 

uncorrelated  noise,  the  MSC  between  the  outputs  can 

be  no  larger  than  the  MSC  between  the  source  and  any 

sensor.  In  particular,  for  two  sensors  the  MSC  is 

th  ' product  of  the  two  source  MSCs,  as  given  in  (2-86). 

However,  it  is  possible  to  have  a source  transmitted 

through  some  nonlinearity  such  that  the  MSC  between 

s(t)  and  Xj^(t)  is  low  and  the  MSC  between  s(t)  and 

Xg(t)  is  low  and  the  MSC  between  Xj^(t)  and  Xg(t)  is  high. 

For  examp] e , suppose  s(t)  is  a member  function  of  a 

stationary  random  process  which  is  separable  in  the 

2 

Nuttall  sense.  Then  the  MSC  between  x^^(t)  = s (t)  and 
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2 

s(t)  is  zero;  similarly^ the  MSC  between  XgCt)  = s (t) 
and  s(t)  is  zero;  however,  for  this  example,  the  MSC 
between  x^(t)  and  X2(t)  is  unity.  Thus, care  should 
be  used  in  interpreting  these  results  since  they  apply 
only  to  the  case  where  the  medium  can  be  accurately 
modeled  by  linear  time  invariant  filters  corrupted  by 
uncorrelated  additive  noise. 

Using  (2-86)  we  can  compute  a SNR  squared  quantity, 


namely , 


G (f) 

G (f ) 


G (f ) 


C (f ) 
^1^2 


. (2-87) 


*^2"2 


To  be  useful  (2-87)  requires  knowledge  of  the  source  to 

i 


sensor  MSCs.  However, if  C (f)=C  (f)= 

sx^ 


=rc 

L 


(f) 


then  it  follows  that 


(f) 


(2-88) 


(f) 


The  results  on  coherence  from  this  chapter  will 
add  to  the  understanding  of  the  role  of  coherence  in 
ML  estimation  of  time  delay  as  will  be  seen  in  the  next 


chapter. 


CHAPTER  3 


MAXIMUM  LIKELIHOOD  ESTIMATE  OF  TIME  DELAY 

In  the  first  section  of  this  chapter  an  ML 
estimator  is  derived  for  determining  time  delay  between 
signals  received  at  two  spatially  separated  sensors  in 
the  presence  of  uncorrelated  noise.  This  ML  estimator 
can  be  realized  as  a pair  of  receiver  prefilters  followed 
by  a crosscorrelator.  The  time  argument  at  which  the 
correlator  achieves  a maximum  is  the  delay  estimate. 

In  the  second  section  of  this  chapter,  the  variance  of 
the  time  delay  estimate  is  derived  and  compared  with 
the  Cramer-Rao  lower  bound,  and  in  the  final  section, 
various  realizations  of  the  processor  are  considered. 

3A.  Derivation 

For  the  purposes  of  the  derivation,  a signal 
emanating  from  an  acoustic  source  and  monitored  in  the 
presence  of  noise  at  two  spatially  separated  sensors 
can  be  mathematically  modeled  as  depicted  in  Figure  3-1. 
Mathematically , 

Xj^(t)=Sj^(t)+nj^(t)  (3-la) 

X2(t)=L'.Sj^(t+D)+n2(t ) , (3-U>) 

where  Sj^(t),  Uj^(t),  and  n2(t)  are  real,  Jointly  stationary 
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random  processes.  The  delay,  I),  is  the  unknown  parameL(^r 
to  be  estimated.  Signal  s^(t)  is  assumed  to  be 
uncorrelated  with  noise  n^(t)  and  n2(t).  Later  we  also 
asSv'ine  nj^(t)  and  02^^^  uncorrelated  with  each  other. 

More  generally,  it  may  be  assumed  that  S2(t) 
is  lliicarly  related  to  s^(t)  by  the  transfer  function 
H(f)=|a(f)[e  . Thus,  unlike  (3-1)  where  the 

Fourier  transform  of  the  system  output  is  as^( f . 
the  output  transform  in  this  case  is  |a(f ) |s^(f 
The  linear  phase  characteristic  of  such  a system  is 
assured  when  the  impulse  response  is  symmetric  about 
t=D.  For  realizable  systems,  this  implies  that  the 
duration  of  the  impulse  response  must  be  finite.  Thus, 
in  a sense,  we  are  estimating  the  midpoint  of  a symmetric 
finite  impulse  response  (FIR)  filter  depicted  in 
Figure  3-2a.  Such  an  impulse  response  is  not  necessarily 
peaked  at  D (as  for  example  in  Figure  3-2b) , In  the 
derivation  which  follows,  then,  a can  (more  generally)  be 
interpreted  as  a frequency  dependent  attenuation  |a(f)|. 

There  are  many  applications  in  which  it  is  of 
interest  to  estimate  the  delay  D.  This  chapter  derives 
an  ML  estimator  and  evaluates  its  variance.  Chapter  4 
compares  the  estimator  with  other  similar  techniques. 
While  the  model  of  the  physical  phenomena  presurr.Co 
stationarity , the  techniques  to  be  developed  herein  may 
be  employed  in  slowly  varying  environments  where  the 
characteristics  of  the  signal  and  noise  remain 


0 r(Mc) 


Figure  3-2  Syi;-.ietric  Impulse  Response  for  Two  FIR 
Linear  Phase  Filters 


( 

I 


5-1 

Stationary  only  for  fin! to  observation  time  T.  Further, 
the  delay  D and  attenuation  a may  also  chaiif^e  slowly. 

The  estimator  is  therefore  constrained  to  operate  on 
observations  of  a finite  duration.  Having  estimated  the 
delay,  an  estimate  of  the  bearing  may  be  obtained  by 
mapping  the  delay  estimate  according  to  (.NulLctll,  Carter 
and  Montavo.n  (1974)) 

A 

0 = arc  cos  ^ > (3-2) 

where  ^ is  the  nominal  speed  of  sound  in  the  non- 
dispersive  medium  and  d is  the  sensor  separation. 

(See  pp.  93-103  of  Urick  (1967).)  A rigorous  derivation 
for  the  ML  estimator  of  D using  the  mathematical  model 
(3-la)  and  (3-lb)  requires  that  signal  and  noise  spectra 
be  given  (that  is,  known).  (See  Hannan  and  Thomson 
(1971).)  When  they  are  unknown,  a heuristic  procedure 
of  estimating  these  spectral  characteristics  is  suggested. 
The  ML  estimator  of  delay  can  be  realized  as  a pair  of 
receiver  prefilters  followed  by  a crosscorrelator.  The 
time  argument  at  which  the  correlator  achieves  a maximum 
is  the  delay  estimate.  Qualitatively,  the  role  of  the 
prefilters  is  to  weight  the  signal  passed  to  the 
corre..ator  according  to  the  strength  of  the  coherence 
function.  This  weighting  turns  out  to  be  equivalent  to 
that  proposed  by  Hannan  and  Thomson  (1973)  and  under 
simplifying  assumptions  to  that  proposed  by  MacDonald 
and  Schultheiss  (1969). but  apparently  differs  from  the 
results  of  Clay,  Hinich  and  Shaman  (1973).  However,  the 


k 
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development  presented  here  does  not  presume  initially 
that  the  estimator  is  a GCC  function.  Rather,  it  is 
shown  that  the  ML  estimate  may  be  realized  by  prefiltering 
and  crosscorrelating  the  data  x^(t)  and  XgCt).  Indeed, 
other  realizations  of  the  ML  processor  are  also  possible. 
(See  section  3C  of  this  chapter.)  For  example,  the  data 
can  be  appropriately  filtered,  cumried,  squared  and 
averaged  in  order  to  estimate  the  delay.  This  latter 
processor  follows  directly  from  the  derivation  presented 
here  and  is  discussed  fully  in  3C. 

To  make  the  model  (3-1)  mathematically  tractable, 
it  is  necessary  to  assume  that  s^(t),  n^(t)  and  n2(t) 
are  Gaussian.  Denote  the  Fourier  coefficients  of  x^(t) 
as 

X.  (k)  * i X.  (t)e''^^^‘^Adt,  (3-3a) 

1 1 Q 1 


where 


(3-3b) 


Note  that  the  linear  transformation  X.(k)  is  Gaussian 

J, 

since  x^(t)  is  Gaussian.  In  practice,  the  integral  will 
be  replaced  by  a discrete  Fourier  transform  (DFT)  or 
FFT.  When  the  number  of  data  point. s in  each  FFT  is  large 
(as  will  usually  be  the  case)  then,  by  a central  limit 
theorem  argument,  X^(k)  will  tend  toward  being  Gaussian 
even  if  the  x^^(t)  are  not  Gaussian.^  This  presumption 


^These  observations  were  brought  to  the  authors 
attention  by  Dr,  G.  Mohnkern  of  the  Naval  Undersea  Center, 
San  Diego,  California. 
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is  borne  out  by  Benignus  (1969b).  Hence,  the 
requirement  that  s^(t),  nj^(t)  and  ngCt)  be  Gaussian 
is  not  a strong  requirement. 

As  the  observation  time 

T (k)  -*•  Xji^(ka)^)  , 

where  X.  is  the  Fourier  transform  of  x^(t).  A more 
complete  discussion  on  Fourier  transforms  and  their 
convergence  is  given  in  Davenport  (1970),  Jenkins  and 
Watts  (1968),  Koopmans  (1974),  Otnes  and  Enochson  (1972), 
Bendat  and  Piersol  (1971)  and  Brillinger  (1975).  From 
MacDonald  and  Schultheiss  (1969),  it  follows  for  T large 
compared  with  1d|  plus  the  correlation  time  of  R (t). 


that 


E [ X^(k)X^()l)l  S 


T x^X2(ko)^),  k=l 


(3-4) 


0 , . 

Note  that  EfX.(k'>]=  E[x^(t)]  =0,  i=l,2. 

Now  let  the  vector 

X(k)  = |^X^(k),X2(k)j  ' , 

where  ' denotes  transpose.  Then  the  covariance  o i'  X(k) 
is 

r 

X^(k)X|(k)  X^(k)X^(k) 
X2(k)X|(k)  X2(k)X^(k) 


(3-5) 


E [^X(k)X*  ' (k)]  =E 


(3-6) 


57 


T 


) G*  ^ (ko)^) 


^2^2 


(3-7) 


^ _L  Q (koj.) 
T X ^ 


(3-8) 


where  Q (w)  is  the  spectral  matrix  of  fx^( t ) , x^( t ) j . 

The  vectors  X(k),  k=-N,-N+l N are,  as  a 

consequence  of  ( 3-4 ),  uncorrel  at.-d  Gaussian  (hence, 
independent)  random  variables.  More  explicitly,  the 
pdf  for  X £ X(-N),X(-N+1) X(N).  given  attenuation 

a and  delay  D is^ 


where 


p(Xla,D)  = h-exp  - ^ > 


Ji  = X*' (k)Q‘^(ko)^)X(k)T 

^ k=-N 


(3-9) 


(3-10) 


and  h Is  a function  of  |Q„(ku,pl  (Van  Trees  (1968)). 
Replacing  TX^(k)  by  Xi(k«p,  the  Fourier  transtorn,  of 
x^(t),  it  follows  from  (3-10)  that 


J = Z X*  (kcj^)Qj^^(kw^)X(ka)^^),p 
k=-N 


(3-11) 


The  ML  estimate  of  D (see,  for  example,  Jenkins  and  Watts 
(1968)  or  Van  Trees  (1968))  is  the  value  of  D which 
maximizes  p(Xla,D). 


lu,.r^  pxnlicitlv  since  the  density  function  depends 
More  f.  X This  notation  obscures 

on  Q one  the  need  to  kn<aw  (or 

the  role  of  the  delay  bit  clarities  a=lu(f)l 

estimate)  signal  and  noise  spectr  . - > 

thl.1  the  pdf  is  conditioned  on  knowing  la(ku.^)l.  k-N. 

-N+1 N. 
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In  general,  the  parameter  D affects  both  h and 
in  (3-9).  However,  for  uncorrelated  noise  in  (3-1), 
h is  independent  of  the  delay. 


For  large  T,  (3-11)  becomes 


~ * ' 


= / X"  (f)Qj^-l(f)X(f)df 


(3-12) 


From  (3-6)-(3-8) , 


Q^(f) 


G ^ (f)  -G  (f) 


-g!  X X 

^1^2  11 


(3-13a) 


R/Gx  ^ (f).-G 
^1^1  ^1 


luc^- 


(f)-G  (f)V 

^2^2  f 


(3-13b) 


where  C.,«(f)EC  (f),  which  will  exist  provided 

that  is,  x^(t)  and  Xg(t)  cannot  be  obtained 

perfectly  from  one  another  by  linear  filtering 

(Carter  and  Knapp  (1975)),  or  equivalently  for  the  model 

(3-1)  that  observation  noise  i^  present. 

When  C (f)=G  (f)=0 

"l"2  "l"2 


^x  X s n - 

^11  ®1®1  ’’l"i 


(3-14a) 


G X (f)=a"G  (f)+G  (f)  , 

^2  2 ®1  1 ’’2  2 


(3 -14b) 
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G ( f )=aG  (f  ) e 

X1X2' 


~.]2t.  id 


C.o(f)=a^G  (f)  G (f)G  (f); 


(3-14r) 


(3-14d) 


and  it  follows  that 


= / X*'(f)Q"^(f )X(f )df=J2+J3  , 

— 00 


where 


(;5-ir.a) 


•^2=  ^ , 


x^(f) 


Xo(f) 


G (f)  G ^ (f) 

XiXi^  X2X2' 


1 - C^2(f) 


df  , (3-15b) 


-J^=  / A(f)+A  (f)  df, 

— 00 


A(f)=X^(f)X*(f)  . 


G ( f ) 
^1^2 


(3-15C) 


(3-15d) 


In  order  to  relate  these  results  to  Hannan  and 
Thomson  (1973)  and  others  and  interpret  how  to  Imple^ment 
the  ML  estimation  technique,  note  that  for  x^(t)  and 
x„(t)  real,  A*(f)=A(-f).  Then  (3-15c)  can  be  rewritten  as 

^ ao  «> 

-J„=  / A(f)df+  / A(-f)df=2/  A(f)df  . (3-lfi) 

^ _0f)  -00  -a> 

Letting  TG  ( f )^X, ( f )X* ( f ) , (3-16) and  (3-15d)  can  be 

X1X2  1 ^ 


written  as 


00 


-J,=2T  / G.  , (f) 


^12^  ^ ^ i 27t  fO 
— df  (3-17) 


Notice  that  the  ML  estimator  for  D will  minimi/,. 


J^=J„+Jo,  but  the  selection  of  D has  no  effect  on  J^. 

JL  ^ u 
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Thus,  D should  maximize  -J3-  Equivalently,  when 

X^(f)X2(f)  is  viewed  as  T times  the  estimated  cross- 

/\ 

power  spectrum,  TG  (f),  the  ML  estimator  selects  as 
the  estimate  of  delay  the  value  of  x at  which 


^1^2 
where 


„ (f)p r 

l’^2  rx.x_  [l-C-.Cf)] 


Ci2(f) 


JSTrfi 


X1X2 


(f)  = 


^1''2 

X^(f)X^(f) 


12' 


df, 


(3-18a) 


(3-18b) 


achieves  a peak.  That  is,  the  ML  estimator  selects  as 
the  estimate  of  delay  the  value  of  i at  which  the  GCC 


uu  . _ ^ 

R G (f)W  (f )e'^^’^^’^df 

’‘1^2  ^1^2  ^ 


(3-19) 


achieves  a peak,  where  Wg(f )=H^(f )H2(f ) is  an  appropriately 
selected  weighting  function  (Knapp  and  Carter  (1976)) 

The  ML  estimator  is  equivalent  to  one  proposed  by  Hannan 
and  Thomson  (1973).  The  ML  estimator  can  be  achieved  as 
depicted  in  Figure  3-3  by  shaping  x^^(t)  with  filter 
Hj^(f)  and  X2(t)  with  filter  H2(f)  then  crosscorrelat ing 
the  filter  outputs  and  observing  what  value  of  delay 
achieves  a maximum.  The  estimator  can  also  be  achieved 
in  other  forms.  (See  section  C of  this  Chapter.)  The 
weighting  proposed  by  Hannan  and  Thomson  (1973)  is 

W (f)  = 1 . . (3-20) 

(1  - C,2(f)l 

where  (as  required  for  Q to  exist)  C.,,,(f)  f 1.  Such 

X X ^ 


Figure  3-3  Received  Waveforms  Filtered,  Delayed, 
Multiplied,  and  Integrated  for  a 
Variety  of  Delays  until  Peak  Output 
is  Obtained 
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weighting  achieves  the  ML  estimator.  When  jO  (f)! 

X1.X2 

and  known,  this  is  exactly  the  proper 

wolghUng.  An  important  consideration  in  estimator 
design  is  the  available  amount  of  a priori  knowledge 
of  the  signal  and  noise  statistics.  In  many  problems, 
this  information  is  negligible.  For  example,  in  passive 
detection,  unlike  the  usual  communications  problem, 
the  source  spectrum  is  unknown  or  only  known  approximately. 
When  the  terms  in  (3-20)  are  unknown,  they  can  be 
estimated  via  techniques  of  Carter,  Knapp  and  Nuttall 
(1973a)^  which  are  summarized  in  appendix  A and 
programmed  in  appendix  C.  Substituting  estimated 
weighting  for  true  weighting  is  entirely  a heuristic 
procedure  whereby  the  ML  estimator  can  approximately 
be  achieved  in  practice.  Such  techniques  have  been 
referred  to  as  approximate  ML  (AML)  techniques  by  Box 
and  Jenkins  (1970)  since  they  are  not,  truly  speaking, 

ML  estimation  techniques. 

Since  the  estimation  of  delay  may,  in  practice, 
be  governed  by  an  AML  rather  than  an  ML  technique,  we 
should  not  expect  that  more  complex  models  will  yield  to 
ML  techniques  without  similar  heuristic  approximation. 
Rather,  the  estimation  of  D with  moving  sources,  for 
example,  will  also  require  AML  techniques  and  may 
be  even  more  prone  to  varying  interpretations. 


' The  crosscorrelation  form  of  the  processor  ’s 

j useful  in  ascertaining  the  statistical  characteristics 

of  the  delay  estimate.  For  each  of  several  different 
trials  a different  estimate  of  delay  might  be  obtained. 

.1 

j For  example,  when  the  true  delay  is  about  5.0  seconds, 

i six  typical  trials  are  sketched  in  Figure  3-4.  One 

actual  example  case  is  given  in  appendix  D.  In 

A 

ascending  orders,  values  of  D are  4.5,  4.9,  5.0,  5.1, 

5.3  and  5.7.  For  trial  number  5,  depicted  on  the 
Figure  3-4,  an  estimate  4.9  is  obtained.  However, 
there  appear  to  be  many  ambiguous  peaks  in  trial  5; 
j indeed  if  the  noise  had  been  slightly  different,  there 

could  have  been  a different  delay  estimate,  such  as: 

4.1,  5.7,  or  6.5;  such  an  error  would  increase  the 

i 

j variation  of  the  time  delay  estimate.  The  derivation  of 

I variance  of  D,  which  follows,  does  not  account  for  errors 

i 

j due  to  ambiguous  peaks.  It  presumes  that  the  estimated 

I delay  is  in  the  neighborhood  of  the  correct  delay  and 

] not  on  a secondary  peak. 

* A lower  bound  on  the  variance  for  any  delay 

' estimator  (which  is  not  necessarily  attainable)  is  given 

by  the  Cramer-Rao  bound 
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Cramer^ao  bounds  are  discussed  in  Van  Trees  (1968) 
and  Sape  and  Melsa  (1971).  The  only  part  of  the  lo^ 
pdf  that  depends  on  x,  the  hypothesized  delay,  is 
Jg  of  (3-17).  Thzit  is. 


E 


(d‘ 


•3t‘ 


In  p (x 


a,x)^  = 


3t 


2 


(3-22) 


If  G „ (f)  = 
^12 


G (f ) 

XfX2 


E 


G (f ) 

^1^2 


=G  „ (f) 
^12 


-j2TTfD  .. 

e , then  since 

it  follows  that 


E(:ij3)=T/ 


Ci2(f) 


II  - Cj2(f',| 


- df  . 


Hence,  the  minimum  obtainable  variance  for  delay 
estimation  is  (Carter  and  Knapp  (1976a)) 


(3-23) 


A 

Minimum  Var(D)= 


T-1 


T/  (2TTf) 


2 ^12^^^ 


df 


[1  - 


(3-24) 


For  the  GCC  processor  with  any  weighting 
W^(f )=Hj^(f )H^(f ) we  will  derive  an  expression  for  the 
local  variation  of  the  delay  estimator  and  show  that 
the  ML  weighting,  (3-20),  indeed  achieves  (3-24).  The 
determination  of  the  variance  of  delay  estimates  close 
parallels  a clever  method  of  MacDonald  and  Schultheiss 


(1969)  . 


Equivalent  to  the  Var  D =Var  x 


P (shown  in 


ly 


Figure  3-5)  is  the  left  to  right  variation  of  the  zero 
crossing  of  the  derivative  of  the  GCC  function  output 


Figure  3-5  Variation  of  Delay  Estimator 
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with  rospoct  to  t (shown  in  Fi^^u^<'  3-6 ) . Typical  moan 
output  of  the  derivative  of  the  correlator  output,  z, 
is  pi  otted  in  Figure  3-5  together  with  similar  cvirves 
a „ above  and  below  the  mean.  For  a,,  small,  so  that 

/j  /j 

curves  are  approximately  linear  between  D-og  and  D+Op. 

the  magnitude  of  the  expected  value  of  the  slope  of  the 
output  at  the  true  value  of  delay  is  given  by 


E R*®  (T) 

’‘1*2 


where  a denotes  standard  deviation.  Again 


0 

z 

0 

t=D 

T 

J (f)=  G (f) 

X1X2'  X1X2' 


X^(f)X2(f) 


I = G 

X..X, 
J 1 ^ 


(f),  it 


e-j2"^  that 


, (3-25) 

t=D 

using 

fol lows  with 


«x  X 

L 12  . 


t=D 


=T/  (2iTf)' 


G (f ) 

X1X2 


W^(f)df . (3-26) 


In  order  to  solve  (3-25)  for  it  is  also 

necessary  to  solve  for  0 in  Figure  3-6.  The  1 undamt'nta  1 

z 

problem  is  to  find  the  variance  of  the  random  variable 
z given  by 

T 

z - / y^^(t)y2(t)dt  . (3-27a) 

o 

(For  our  particular  problem  we  will  later  assume  that 
y^(t)  is  the  output  of  a filter  excited  by  x^(t)  and 
y«(t)  is  the  output  of  a filter  excited  by  Xo(t).) 


The  variance  of  z is  given  by 


69 


(3-27b) 


where 


and 


E[z]  = E [ / yj^(t)y2(t)dt] 
o 

= E [yj^(t)y2(t)]  dt 
o 

= T R (0) 


T T 


(3-27C) 

(3-27d) 

(3-27e) 


E[z  ] = / / E[y- (t- )y2(t- )y^(t2)y2(t2>]  dt^dt  . (3-27f) 
o o 

Evaluation  of  the  fourth  moment  in  (3-27f)  can  be  achieved 
under  Gaussian  assumptions.  In  particular,  if  y^(t) 
and  y2(t)  are  jointly  Gaussian  (and  stationary),  then 


dt,dt„  • 


y^72  \ ''2^‘V2yi  ^ ''2^J“''1“''2 
Letting  T“tj^-t2  and  using  (3-27b)  and  (3-  27e),  (3-27g) 
becomes 

oa  OQ 

2 


(3-27g) 


-CO  -»[\yi^^^\y2(T)+\y2^^^%2y 

4'(T+t2)'i'(t2)dTdt2  ^ 


(t) 

1 


(3-27h) 


where 


H-Ct) 


1 , tE(0,T) 

fO  , elsewhere  . 


Integrating  (3-27h)  with  respect  to  t^  and  manipulating 


yields 
2 _ 
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For  l:irf?e  T (3-27i) 
2 


OO 


'loo'  y 1^1  ^2^2 

By  Parseval's  Theorem 

m 

2 


If  y (t)  is  the  output  of  a filter  K^(f)  cascaded  with  a 
differentiator  and  72^^)  is  the  output  of  a filter  H2(f) 
cascaded  with  a variable  delay,  then 


(3-27k) 


(f ) = 

|H,(t)|^  (2.f)2 

(3-271 ) 

, (f)  = 

(3-27m) 

>> 

CM 

>> 

1 (f)  = 

^1^2 

H^(f)H*(f)e'^^''^V,^  ^ (f)  . 

X 2 

(3-27n) 

For  T “D  it  follows,  from(3-27k)  - (’3-27n),  since 

♦ 


'.V 


( f )=H^ (f )H„(f).  that 

g X ^ 


CO 

= t; 

w„(f) 

_co 

t-D 

g 

'(2Tif)\  ^ (f)G  ^ UKl-G^gvOldf  .(3-270) 
^22 


CombininR  (3-25)  through  i3-27o)  yields 

2 


0^=0. 


{L 


(2Tif)  Gx^x.,  ( f )Gx2X2^  ' 


(3-28) 


t-D 


1 «o  2 

(T)^;  (2Tif) 


G (f ) 

XfX^ 


W (f)  df 
B 


which  is  valid  for  any  ^^(f).  By  substituting  the 
appropriate  weighting  function  into  (3-28)  the  standuid 
deviation  of  time  delay  estimates  from  each  processor 
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can  be  analytically  evaluated. 

Parenthetically,  we  note  that  the  results  (3-28) 
with  a particular  weighting  (3-20)  can  be  related  to 
(?0)  of  MacDonald  and  Schultheiss  (1969)  as  follows. 
Define  the  bearing  to  an  acoustic  source,  similar  to 
(3-2),  as 


0 = 


(3-29) 


where  C is  the  (nominal)  speed  of  sound  in  the 
nondispersive  medium.  Consider  the  case  where  the 
estimated  D equals  the  true  delay  D plus  a perturbation 
n.  By  a Taylor  series  expansion  it  follows  that 


arccos  (D+n ) ] =arccos [^D] + 

dD 


a y*r*r*r\c 


(D-D)  . (3-30) 

D=D 


Thus  the  bearing  error 


- arccos  (^(D+n)]  - arccos^^D^ 


= iL 


and  r 


dsinS 


■(D-D)  , 


|^\^ar  . 


(3-31a) 

(3-31b) 

(3-32) 


dsin0 

The  term  dsinS  can  be  viewed  as  the  effective  array 
length  (sensor  separation)  physically  steered  at  the 
source.  Assuming  equal  noise  spectra,  combining  (3-32) 
with  (3-28)  and  (3-20),  and  introducing  a change  of 
variables  yields  an  expression  which  agrees  with  (20) 
of  MacDonald  and  Schultheiss  (1969)  whan  9 is  interpreted 
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; 

as  source  (not  wavefront)  anRlo.  Combining  (3-28)  and 
I ' (3-32)  suggests  that  in  order  to  reduce  to  variance  oi 

th(‘  bearing  estimate  the  observation  period  and  the 
" : sensor  separation  should  be  made  as  largo  as  possible. 

“ (In  practice,  there  will  undoubtedly  be  limitations  on 

; both  sensor  separation  and  observation  time.)  Further, 

’ ' since  (3-32)  depends  on  the  effective  array  length 

: physically  steered  toward  the  source,  this  suggests  the 

i desirability  of  sensor  mobility  to  maximize  the  term 

5 dsin0. 

: It  has  been  shown  that  the  variance  of  the  time 

' I delay  estimate  In  the  neighborhood  of  the  true  delay, 

' tor  general  weighting  function  Is  given  by 


8 A ioo 

|V*> 

Var  D = — 

- 

r 

I 

i 

1 2 

T 

I (2lTf)^ 
«00 

X 

^12 

W (f)  df 
g 

which  for  real  processes  may  also  be  written 


00 

/ 

Q. 

00 

1 1 

1 2 2 

oc 

T 1^0 

X 

’‘l  2 

W (f)f  df] 
g ' 

Notice  that  a scale  factor  change  in  W^(f)  does  not 
change  the  variance  of  the  delay  e.stimator. 

The  variance  of  the  ML  processor  is 

-1 

Var“‘'  6 ={2T  r(2,t)%2(l)/(l-Cj2(t)l<il)  , 

o 


-33a) 


-33b) 


-34) 


4 
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which  is  the  Cramer-Rao  lower  bound  (3-24).  It  should  be 
reemphasized  that  (3-33)  and  (3-34)  evaluate  the  local 
variation  of  the  time  delay  estimate  and  thus  do  not 
ac(;ount  for  amblRuous  peaks  which  may  arise  when  the 
averaging  time  is  not  large  enough  for  the  given  signal 
and  noise  characteristics.  Indeed,  when  T is  not 
sufficiently  large,  local  variation  may  be  a poor 
indicator  of  system  performance  and  the  envelope  of  the 
ambiguous  peaks  must  be  considered^  (p.  40  of  MacDonald 
and  Schultheiss  (1969)  and  p.  41  of  Hamor  and  Hannan 
(1974)).  Further,  (3-33)  and  (3-34)  predict  system 
performance  when  signal  and  noise  spectral  characteristics 
are  known.  For  sufficiently  large  T,  these  spectra  can 
be  estimated  accurately.  However,  in  general,  (3-33) 
and  (3-34)  must  be  modified  to  account  for  estimation 
errors;  alternatively,  system  performance  can  be 
evaluated  by  computer  simulation.  Empirical  verification 
of  expressions  for  variance  has  not  been  undertaken  by 
simulation,  because  to  do  so  without  special  purpose 
correlator  hardware  would  be  computationally  prohibitive. 
For  example,  for  a given  G 


®1®1 


^n  n ^n  n 

■^l"l  "22 


a,  and  averaging  time  T,  an  estimated  GCC  function  can 
be  computed,  from  which  only  one  number  (the  delay 


^These  observations  were  brought  to  the  author's 
attention  by  C.  Stiadling  and  R.  Trueblood  of  the  Naval 
Undersea  Center,  San  Diego,  California. 
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estimate)  can  be  extracted.  To  empiiically  evaluate 
the  statistics  of  the  delay  estimate  (which  would  bo 
valid  only  for  these  particular  signal  and  noise  spectra) 
many  such  trials  would  need  to  be  conducted.  We  have 
conducted  one  such  trial  (with  T large)  and  verified 
that  useful  delay  estimates  can  be  obtained  by  inserting 
estimates  6 (f)  and  C.,„(f)  in  place  of  the  true 

values  in  (3-20).  This  might  have  been  expected  since 
the  estimated  optimum  weighting  will  converge  to  the 
true  weighting  as  T-»-“.  (The  statistics  of  the  MSC 
estimates  are  given  in  appendix  B.)  In  practice,  T 
may  be  limited  by  the  stationarity  properties  of  the 
data,  and  (3-34)  may  be  an  overly  optimistic  prediction 
of  system  performance  when  signal  and  noise  spectra  are 
unknown . 

With  these  qualifications  in  mind,  consider  the 
following  example  of  computing  the  variance  of  the  ML 
time  delay  estimate.  Let 


Then 


= (C, 

lo  , 


fc(0,B) 

otherwise 


The  strong  dependence  of  the  estimator  variation  to  the 

1-C 

coherence  is  illustrated  in  a plot  of  versus  C in 

Figure  3-7.  Note  since 


7(5 


= C I 1+C+C^+.  . . 1 , 

that  for  C<<1,  (3-36)  is 


(3-36) 


But  for  C=l-A , where  A<<! , then 

C ^ s 1 

1-C  A A * A ' 


(3-37) 


(3-38) 


An  approximate  comparison  of  C=0.01  with  C=0.99  shows 
the  variance  changed  not  by  a factor  of  100  to  1 but 
10,000  to  1.  The  implication  is  that  weakly  coherent 
signals  do  not  contribute  much  to  reducing  the  variance 
of  the  delay  estimate.  That  is  not  entirely  so  but  is 
roughly  correct.  For  example,  high  frequency,  low 
coherent  power  may  be  important.  A more  complete 
discussion  of  the  variance  of  several  proposed  time 
delay  estimators  is  given  in  Chapter  4.  Prior  to 
Chapter  4,  we  will  discuss  other  realizations  of  the 
ML  delay  estimator. 

3C.  Other  Realizations  of  the  ML  Estimator 
This  section  of  Chapter  3 will  present  four  methods 
for  implementing  the  ML  estimator  for  delay.  One  (and 
only)  of  the  methods,  the  one  considered  to  be  most 
promising^ has  been  programmed. (See  appendix  C.)  The 
program  presumes  that  signal  and  noise  waveforms  are 
real  and  that  their  statistics  are  unknown;  hence  the 
program  uses  appropriate  estimates  in  lieu  of  known 
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values,  when  forming  the  weighting  function. 

The  first  realization  whicli  comes  to  mind  is  a 
bank  of  allowable  delays  as  depleted  in  Figure  3-S. 

Each  data  waveform  Xj(t)  and  XgCt)  is  filtered  by  H^(1) 
and  HgCf),  respectively.  The  output  of  H2(f)  is 
delayed  for  several  reasonable  values  of  delay  depending 
on  the  resolution  desired,  a priori  knowledge  and 
processing  cost  allowed.  Each  delayed  output  is  multiplied 
with  the  output  of  H^(f).  After  integration  for  T seconds, 
the  delay  that  yields  the  maximum  award  is  the  estimate 
of  delay. 

The  second  method  is  to  realize  that  the  bank  of 
delays  in  Figure  3-8  corresponds  to  a particular  method 
for  computing  the  GCC  function.  Indeed  we  need  not  be 
particular  about  the  details  of  how  the  GCC  function  is 
estimated  so  long  as  it  is  estimated  "accurately." 

The  second  method  uses  the  overlapped  FFT  method 
presented  by  Carter,  Knapp,  and  Nuttall  (1973a)  to 
compute  the  estimated  cross  spectrum  and  MSC.  The 
estimated  cross  spectrum  is  appropriately  weighted  and 
inverse  transformed  via  an  FFT  to  obtain  the  estimated 
GCC  function.  The  delay  where  the  GCC  peaks  is  the 
estimate  of  delay.  One  advantage  to  methods  1 and  2 is 
that  by  computing  the  crosscorrelation  for  a large 
range  of  delays  the  presence  of  more  than  one  delay 
(acoustic  source)  can  be  observed.  There  are  other 
advantages,  too;  in  the  GCC  method  uncorrelated  cross 


3-8  Block  I'ias^rani  of  Open  Loop  ML  Time  Delav  Estimator  Realization 


79 


terms  vanish  and  there  is  no  unknown  residual  bias  to 
account  for  when  establishing  thresholds  (other  than 
the  type  discussed  in  appendix  B). 

If  we  desire  to  use  a closed  loop  control  scheme  to 

A 

automatically  adjust  the  delay  estimate  D,  we  can 
instrument  the  estimator  with  a derivative  in  one 
channel  much  like  our  discussion  of  the  variance  of  the 
estimator.^  When  we  are  in  the  neighborhood  of  the 
correct  delay,  the  output  in  Figure  3-9  should  be 
approximately  zero.  Any  difference  from  zero  (that  is, 
error)  is  fed  back,  perhaps  smoothed  and  scaled,  and 
used  to  adjust  the  delay  estimate  in  order  to  drive  the 
system  output  to  zero.  For  estimating  more  than  cine 
delay  (acoustic  source)  with  this  realization,  more 
than  one  variable  delay  is  required.  It  should  be 
noted  as  pointed  out  by  Kochenburger  (1972)  that 
differentiation  is  a "noisy"  process  which  should  be 
avoided.  However,  the  filter  H^(f)  and  the  integrator 
in  Figure  3-9  may  reduce  the  adverse  effect  of  this 
realization. 

The  final  realization  to  be  discussed  Is  the 
method  of  Carter  and  Knapp  (1976a).  In  this  method 
we  re-examine  our  derivation  in  section  3A.  In 

^hls  idea  was  brought  to  the  author's  attentif>.i 
by  J.  P.  lanniello  of  the  Naval  Underwater  .Systems 
Center,  New  London,  Connecticut. 
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For  uncorrelati^d  noises  Q does  not  depend  on  D; 

n 

therefore,  the  total  award  is  maximized  by  maximizing 


Jd  = - I / X*H*h'x  df, 


(3-43) 


where  the  1x2  vector  filter 
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By  Parsoval's  ThtJorem,  (B-'IB)  can  be  irnpl  emented 


by  filtering  x^(t)  with  filter  f)  and  filtering 
X2(t)  with  filter  H2(f),  then  summing,  squaring,  and 
averaging . 

If  we  separate  from  HgCf)  that  portion  dealing 
with  the  hypothesized  delay  we  can  realize  the  delay 
estimator  as  shown  in  Figure  3-10.  Moreover,  note  that 
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Thus,  the  estimator  can  be  realized  as  shown  in 
Figure  3-11.  For  low  SNR,  that  is,  when 


(3-46c) 


ss 


"l"l 


<<1  and  ” ^ss^^^  <<1 


I 


1 + G v'q  = 1 (3-47) 

ss  n 

then  the  filter  following  the  summation  in  Figure  3-11 
is  approximately  a unity-gain  zero-phase  all-pass 
network.  Note  in  Figure  3-11  that  the  form  of  the 
filters  at  each  sensor  depends  on  the  signal  and  noise 
spectrum.  In  particular  the  estimation  of  D presented 
here  requires  filtering  in  exactly  the  fashion  as  the 
detection  of  a signal  arrival  presented  by  Knapp  (1966). 


These  low  SNR  filter  forms  are  commonly  referred  to  as 
Eckart  filters  after  early  work  done  in  the  detection 
area  by  Eckart  (1952). 


igure  3-10  Filter  and  Sum  Realization  of  ML  Time  Delay  Estimator 
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CHAPTER  4 


COMPARISON  OF  THE  ML  ESTIMATOR  TO  OTHER  PROPOSED 
SUBOPTIMUM  PROCESSORS 

The  objective  of  Chapter  4 is  to  compare  the  ML 
time  delay  estimator  with  several  other  processors 
that  have  been  proposed.  From  Chapter  3,  we  know  that 
the  ML  processor  will  have  the  minimum  local  variation. 
Also,  the  previously  derived  expressions  for  the  local 
variation  of  any  correlation  processor  can  be  used  to 
analytically  compare  other  intuitively  appealing 
correlation  processors.  Additionally,  the  effect  of 
erroneously  identifying  the  signal  spectrum  will  be 
investigated,  since  that  will  cause  the  selection  of  an 
erroneous  weighting  function. 

The  first  section  of  this  chapter  presents  the 
motivation  for  the  use  of  crosscorrelation  processors. 

The  second  section  compares  several  such  processors, 
and  the  third  section  considers  the  interrelationships 
of  these  various  processors. 

4A.  Motivation  for  Crosscorrelation  Processors 

For  the  model 

Xj^(t)  * Sj^(t)+nj^(t)  (4-la) 

XgCt)  ==  aSj^(t+D)+n2(t)  (4-lb) 

one  common  method  of  estimating  the  time  delay  D is  to 
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compute  the  crosscorrelation  function 

\ X (T)=E[x^(t)x2(t-T))  , (4-2) 

1 2 

where  E denotes  expectation.  The  argument  t that 
maximizes  (4-2)  provides  an  estimate  of  delay.  For 
models  of  the  form  of  (4-1),  the  crosscorrelation  of 
Xj^(t)  and  Xg(t)  is 


I „ (t)=oR  (t-D)-!R 
^1^2 


"l'^2 


(T) 


(4-3) 


The  Fourier  transform  of  (4-3)  gives  the  cross-power 
spectrum 


^1^2 


(f)-aG„  ^ (f)e’J2”^^+G_  „ (f) 


SiSi 


niU2 


(4-4) 


If  n.,(t)  and  n«(t)  are  uncorrelated  (G  (f)*0),  the 

cross -power  spectrum  between  Xj^(t)  and  Xg(t)  is  a scaled 
signal  power  spectrum  times  a complex  exponential.  Since 
multiplication  in  one  domain  corresponds  to  convolution 
in  the  transformed  domain  (see,  for  example,  Oppenheim 


and  Schafer  (1975)),  it  follows  for  G (f)=0  that 

'^l"2 


^1^2 


(T)  -oR„  ^ (T)  @4(r-D) 


®1®1 


(4-5) 


One  Interpretation  of  (4-5)  is  that  the  delta 
function  has  been  spread  or  "smeared"  by  the  Fourier 
transform  of  the  signal  spectrum.  If  Sj^(t)  is  a white 
noise  source,  then  its  Fourier  transform  is  a delta 
function  and  no  spreading  takes  place.  An  important 
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( 

i 

i property  of  autocorrelation  functions  is  that 

\ i R (t)<R  (0).  Equality  will  hold  Cor  cortain 

; I 

i 

^ I Tor  periodic  functions  (see,  for  example,  Davenport 

! (1970),  pp.  323-326).  However,  for  most  practical 

applications,  equality  does  not  hold  for  t^O,  and  the 

I 

j true  crosscorrelation  (4-5)  will  peak  at  D regardless  of 

whether  or  not  it  is  spread  out.  The  spreading  simply 
acts  to  broaden  the  peak. 

In  fact,  more  generally,  when  x^^(t)  and  Xg(t) 
have  been  filtered  by  and  Hg,  respectively,  then  the 
cross-power  spectrum  between  the  filter  outputs  is 
given  on  p.  399  Davenport  (1970)  as 

I 

i 

Therefore,  the  GCC  between  x^(t)  and  XgCt)  is 

00 

^ X df  , (4-7a) 

*12  -®  ® *12 

where 

Wg(f)=H^(f)H*(f)  (4-7b) 

I 

denotes  the  general  frequency  v/eighting.  The  particular 
weighting  selected  is  denoted  by  a change  in  the  sub- 

I script  g. 

1 

For  all  of  the  proposed  weightings  which  we  will 
investigate.  W(f)=W*(f)  and  W(f)=W(-f);  that  is,  V/(f)  is 
real  and  even.  These  properties  are  also  held  by  tho 
minimum  variance  ML  weighting. 
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To  distinguish  which  of  the  proposed  general 
weightings  has  been  applied,  we  denote 


G (t)=G«  (f) 


and  thus 


(4-8:0 


V W (f)[aG^  ^ (f)e  (f)]  . (4-8b) 

VlYg  g s^s^  n^n2 

When  the  noises  are  incoherent,  taking  the  Fourier 
transform  of  (4-8b)  yields 


^ (t)=R_(t)  0 aR„  „ (T)©«(r-D)  , 


^1^2 


ww 


SiSi 


(4-9) 


where  R (t),  the  inverse  Fourier  transform  of  W^(f),  is 
even.  This  being  the  case,  the  true  GCC  will  also  peak 
at  D regardless  of  the  specific  weighting.  Thus  one 
might  be  puzzled  as  to  why  any  weighting  is  needed. 

Indeed,  the  crosscorrelation  function  alone  is  a useful 
technique  for  estimating  time  delay. 

Two  practical  reasons  why  prefiltering  is  desirable 
are  evident.  If  the  noise  is  coherent,  for  example, if 


"l"2  ®2®2 


(4-10) 


then 


R?  . (0=R„„„(t)  0 (aR^  ^ (T)  0 «(r-D) 


^1^2 


ww 


SjS, 


A ^ 


(4-11) 


It  is  clear,  from  (4-li),  that  the  convolutions  by 

R (t)  and  R (t)  will  produce  two  peaks  which  may 
®1®1  ^2®2 
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bo  spread  Into  one  another.  The  convolution  by  R (t) 

W^V 

tan  aid  to  undo  this  smearing.  For  a single  delay 
broadening  of  the  delay  peak  may  not  bt  a serious 
problem.  However,  when  the  signal  has  multiple  delays, 
the  true  crossccrrelation  is  given  by 


(4-12) 


In  this  case  also,  the  convolution  with  R (t)  can 

®1®1 

spread  one  delta  function  into  another,  thereby  making 
it  impossible  to  distinguish  peaks  or  delay  times.  Under 
ideal  conditions  where  Vfp  (f)=G  (f),  W (f)  should 

be  chosen  to  ensure  large  sharp  peaks  in  R (t)  rather 

^1^2 

than  a broad  one  (see  Figure  4-1),  since  this  will  ensure 
good  time  delay  resolution. 

There  is  a second  important  reason  why  prefiltering 

A 

is  desirable.  In  practice,  only  an  estimate  G „ (f) 

XiXg 

of  G (f)  can  be  obtained  from  finite  observations  of 
*12 

x^(t)  and  X2(t).  Thus  we  can  never  exactly  obtain  the 

crosscorrelation  from  a limited  amount  of  time  data. 

Because  of  the  finite  observation  time,  then,  R „ (t) 

*1*2 

can  only  be  estimated.  For  example,  for  real  ergodlc 
processes  an  estimate  of  the  crosscorrelation  is  given 
on  p.  327  of  Papoulls  (1965),  as: 

®x,x  ^ \(t)X2(t-T)dt  , 

1 ^ T 


(4-13) 
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where  T represents  the  observation  interval.  For  limited 
duration  data  records,  the  accuracy  of  the  delay  estimate, 
D,  can  be  improved  by  prefiltering  Xj^(t)  and  Xg(t) 
prior  to  the  Integration  in  (4-13).  In  practice  we  can 
compute  (4-13)  by  weighting  the  estimated  cross  spectrum 
and  computing  an  inverse  Fourier  transform  to  obtain  an 
estimated  GCC  as  follows; 

00 

(T)-/  W rf)G^  ^ (f)e^2’'^^df  . (4-14) 

12  -«  * 12 

W„(f)  now  serves  to  improve  the  estimate  of  R (t) 

g Xj^Xg 

used  to  estimate  time  delay. 

In  practice,  depending  on  the  particular  form  of 
W^(f)  and  the  a priori  info:fmation , it  may  also  be 
necessary  to  estimate  W (f).  For  example,  when  the  role 
of  the  prefilters  is  to  accentuate  the  signal  passed  to 
the  correlator  at  those  frequencies  at  which  the  SNR 
is  highest,  then  W (f)  can  be  expected  to  be  a function 
of  signal  and  noise  spectra  which  must  either  be  known 
a priori  or  estimated. 

Hence,  we  see  that  the  true  crosscorrelation 
function,  for  the  model  (4-1),  is  sufficient  to 
determine  the  correct  time  delay;  but  for  practical 
(finite  data)  considerations  it  is  desirable  to  prefilter 
Xj^(t)  and  Xg(t)  prior  to  crosscorrelation.  Indeed,  the 
problem  of  selecting  W (f)  to  optimize  certain  performance 
criteria  is  not  new  and  has  been  studied  by  several 
investigators.  (See,  for  example,  Akalkc  and  Yamanouchi 
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(1963),  Bangs  (1971),  and  Hannan  and  Thomson  (1971).) 

Our  intuitive  discussion  of  sharply  peaked 
estimators  may  suggest  certain  types  of  weighting. 
However,  sharp  peaks  are  more  sensitive  to  errors 
introduced  by  finite  observation  time,  particularly  in 
cases  of  low  SNR.  Thus,  as  with  other  spectral 
estimation  problems,  the  choice  of  W (f)  is  a compromise 

D 

between  good  resolution  and  stability.  In  the  subsequent 
section  we  compare  several  promising  weighting  functions 
proposed  previously  in  the  literature. 

4B.  Comparison  of  Proposed  Processors 
The  preceding  discussion  provides  background  for 
the  role  that  W (f)  is  to  play.  Now  the  six  versions 

D 

of  the  generalized  crosscorrelation  function  listed  in 
Table  4->l  will  be  examined  individually.  In  the  process 
of  comparing  the  processors  in  Table  4-1,  there  will  be 
a tendency  to  want  to  look  at  some  simple  cases,  for 
example,  equal  white  noises  and  strong  (or  weak)  white 
noise  signals.  In  this  regard,  it  can  be  shown  for  the 
case  where  G „ (^)*G»  „ (f)=G  „(f)  is  equal  to  a 

constant  times  G _ (f)  (whether  or  not  the  signal  is 

®1®1 

white)  that  five  of  the  processors  in  Table  4-1  provide 
for  the  identical  frequency  weighting,  except  for  a 
constant.  (The  crosscorrelation  processor  (W(f)=l,Vf) 
is  a delta  function  smeared  out  by  the  Fourier  transform 
of  the  signal  (noise)  power  spectrum.)  In  these  cases, 


Processor  Name 


1.  Roth  Impulse  Response 


2.  Smoothed  Coherence 
Transform  (SCOT) 


3.  Phase  Transform  (PHAT) 


4.  Crosscorrelation 


5.  Eckart 


6.  Maximum  Likelihood 
(ML) 


I 

i 

I 


Weight 

W(f)=H^(f)H*(f) 
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the  delay  estimate  from  each  of  these  five  processors 
will  have  the  same  variance.  Hence,  a complete  comparison 
can  only  be  made  when  detailed  signal  and  noise  character- 
istics are  provided.  Such  information  is  largely 
dependent  on  the  particular  application  ana  a detailed 
comparison  is  therefore  beyond  the  intent  of  this  work. 

For  underwater  acoustic  applications,  characteristics 
of  the  radiated  and  self  noise  of  ships,  submarines,  and 
torpedoes  and  the  noise  background  of  the  sea  are  given 
by  brick  (1967).  For  more  fundamental  signal  and  noise 
characteristics,  it  is  useful  to  r jvide  a brief  example 
of  using  (3-33)  and  (3-34).  Suppose  the  example 

corresponds  to  (4-1)  where  a"l;  G (f)=l,  Vfe(-B,B) 

ss 

otherwise  G__(f)»0;  G_  _ (f)*G  (f)*l,Vf.  it  follows 

88  nj^n^  DgUg 

from  (2-1)  and  (2-2)  that 


Ci2(f) 


>°ss”>*Vn  Cnl 


.(-1-15) 


‘11 


■2"2 


Hence, 


Ci2(t) 


jO.25  , Vf  (-B.B) 

(o  , otherwise  . 

Other  values  are  given  in  Table  4-2. 

4B1.  Roth  Processor 


The  weighting  proposed  by  Roth  (1971) 
1 

- rn  ' 

^1^1 


WRvf)  = Q 


(4-16) 


where  the  subscript  R is  to  distinguish  the  choice  of 
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Wg(f),  yields^ 


(f) 

1^2  J27rfT.. 
/ ^ \ ® dr 


XiXi 


m 


14-17) 


Equation  (4-17)  estimates  the  impulse  response  of  the 
optimum  linear  (Wiener-Hopf ) filter^ 


»m(^>=-G 


^1*2 


(f) 


XiXi 


(f) 


(4-18) 


which  "best"  approximates  the  mapping  of  Xg(t)  to  Xj^(t) 
(see,  for  example,  Van  Trees  (1968),  Carter  and  Knapp 
(1975)  and  the  discussion  of  Theorem  2-3  ) . If  nj^(t)?‘0, 
as  is  generally  the  case  for  (4-1),  then 


’‘1*1 


and  ideally 


(4-19) 


(R) 

I 

^1^2 


(t)-6(t-D)  G I 


aG  ^ (f) 
®1®1 


®1®1 


mrs— Tfr^'"''"<4-20) 


"l"l 


Therefore,  except  when  G „ (f)  equals  any  constant 

"l"l 


(including  zero)  times  G (f),  the  delta  function  will 

®1®1 


again  be  spread  out.  The  Roth  processor  has  the 
desirable  effect  of  suppressing  those  frequency  regions 


^As  discussed  earlier,  W(f)  may  have  to  be  estimated 
for  this  processor  and  those  which  follow,  because  of  a 
lack  of  a priori  information.  In  this  cjse,  (4-16)  may 
require  that  G (f)  be  replaced  with  G (f). 


I 


where  G „ (f)  is  large  and  6^  ^ (f)  is  therefore 


Vl 


more  likely  to  be  in  error. 
From  (3-33), 


00  < (l-C)f^df 

R /-  x-x- 

Var(f))=  — — 


/ 0.x, 

L o 12 


In  the  example  of  Table  4-2  this  becomes 


® 2 3 ^2 

f t T df+/p  f Idf 

O 4 D 


8tt^  T 


“ “1  ( 

rB^2  X ^ * 

[o'  ^ J 


B^+  4h^-4b^ 

. — 2 — 2 , (4- 

9 

when  B=H  (4-22b)  agrees  with  13-35)  as  expected;  but 

if  H is  large  in  comparison  with  B,  the  variance  of  the 

Roth  processor  will  be  large  in  comparison  to  the 

Crame^r-Rao  bound  (3-24). 

4B2.  Smoothed  Coherence  Transform 

Errors  in  G (f)  may  be  due  to  frequency  bands 
^1^2 

where  G (f)  is  large,  as  well  as  bands  where 
G (f)  is  large.  One  is  therefore  uncertain  whether 


to  form  W„(f)=l/G^  ^ (f)  or  W (f)=l/G  „ (f):  honco. 

It  ” ^2^2 
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the  smoothed  coherence  transform  (SCOT)  proposed  by^ 
Carter,  Nuttall,  and  Cable  (1973)  yields 


Wg(f)  = 1/ 


V X X 

1 *2  2 


(f) 


This  weighting  gives  the  SCOT 


.(s) 

^x  X 


(T)  =/  ^ (f)e*^^"^"df  , 


00  ^1^2 


where  the  coherence  estimate' 


(4-23) 


(4-24) 


^x  X 


(f) 


■ ' ~XoX 


f)G^  ^ (f) 
2 2 


(‘1-25) 


For  and  H^(f  j , the 


^2^^2 


SCOT  can  be  interpreted  as  prewhitening  filters  followed 

by  a crosscorrelation.  When  G (f)=G  (f),  the 

*2^2 


SCOT  is  equivalent  to  the  Roth  processor.  If  n^(t)^0 
and  P2(  1)^*0,  the  SCOT  exhibits  the  same  spreading  as 
the  Roth  processor. 


iThe  SCOT  was  originally  proposed  by  G.C. Carter, 
A.H.  Nuttall,  and  P.G. Cable  in  1972  and  successfully 
applied  to  actual  data  by  G.C. Carter  and  P.G. Cable  in 
1972  and  Brady  (1973)  for  part  of  his  Ph.D.  work. 

more  standard  coherence  estimate  is  formed 
when  the  autospectra  must  also  be  estimated,  as  is 
usually  the  case.  (See  Carter,  Knapp  and  Nuttall  (1973a),) 
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From  (3-33) 


Var(D)= 


/ [1-C(f)]df 


r 1 

2 

8rr^T 

^"^2^(f)df 

L 0 -1 

(4-26) 


Note  as  C(f)->’1,  the  numerator  becomes  small  and  the 
denominator  becomes  large.  For  our  example,  since 
G (f)=G  (f)  the  SCOT  has  the  same  variance  as 

the  Roth  processor. 

4B3.  Phase  Transform 


To  eliminate  the  spreading  evident  above,  the 
phase  transform  (PHAT)  uses  the  weighting^ 

1 


W (f)  = 
P 


^x  X 
*1*2 


(4-27) 


which  yields 


.(P) 

R ^ 
*12 


« '^x  X 

. 12  _j2irfT^, 

^ i'g:  7Tf)'^  ‘ 


x,x. 


(4-28) 


For  the  model  (4-1)  with  uncorrelated  noise  (that  is, 

Gn  n (f)*0). 

"l"2 


G (f) 
*1*2 


=aG  (f)  . 
®1®1 


(4-29) 


^he  PHAT  was  originally  suggested  by  G.C. Carter, 
A.H.  Nuttall  and  P.G.  Cable  in  1972. 


t 
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Ideally,  when  G (f)-G  (f), 

*12  *12 


X 

,12  _ J(J>(f)_J2TTfD 

srrTTT'  ' 

*1*2 


has  unit  magnitude  and 


(0=6(t-D) 

*1*2 


(4-30) 


(4-31) 


The  PHAT  was  developed  purely  as  an  ad  hoc 
technique.  Notice  that,  for  models  of  the  form  of 
(4-1)  with  uncorrelated  noises,  the  PHAT  (4-28), 
ideally,  does  not  suffer  the  spreading  that  other 
processors  do. 

From  (3-33), 

(P).  h (1-C)  df 

Var(D)  . ■ -f  - „ • (4-32) 

Sn'^T  j^/  f^dfj 

As  C-^1 , ■‘■Of  so  the  processor  will  behave  well 

(that  is,  low  variance).  However,  as  expected,  as  C-*^0 
the  variance  grows  without  bound.  For  the  example  in 
Table  4-2,  assuming  the  weighting  is  zero  for  f>H  , 


(p)- 

Var(D)» 


" hi'd 


8Tf^T 


H 

/ 

- o 


f^df 


(4-33) 


Except  when  H»B,  this  processor  will  suffer  a complete 
breakdown  as  C tends  to  zero.  When  H=E,  we  obtain  the 
same  variance  as  the  Roth  and  SCOT  processors  for  then 


102 


(as  indicated  earlier)  G (:f)=G  (f)=G  (f)  and 

"22  ®1  1 

all  processors  behave  equally  well,  For  models  of  the 
form  of  (4-1),  the  poor  behavior  of  the  PHAT  suggests 
that  W(f)  should  not  be  inversely  proportional  to 
signal  power.  The  crosscorrelator  is  one  method  of 
avoiding  the  application  of  weight  inverse  to  signal 
characteristics.  Two  other  processors  in  Table  4-1 
also  assign  weights  or  filtering  proportionate  to  SNR: 
the  Eckart  filter  (Eckart  (1952))  and  the  ML  estimator 
or  processor  of  Hannan  and  Thomson  (1973).  We  now 
examine  these  three  processors  in  depth. 

4B4.  Crosscorrelation 
The  variance  of  the  delay  estimate  from  the 
crosscorrelation  processor  is 


XC  - / f^*^x^x.^^"x2X2(l-C)df 

Var(D)=  ° 


1 

r 

00 

to 

8 

to 

G Idf 

1 

L ° 

^^1^21  -J 

(4-34) 


For  the  example  case  in  Table  4-2,  (4-34)  yields 


XC  . 

Var(D)  = 


“2  3 H 2 

/ f^-4-jdf+/  f'^ldf 

o B 


® 2 

STT^lj/  f'^-  Idf 

b3  B^ 

® ^ 3 ■ 3 


(4-35a) 


Sir  T 


(4-35b) 
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For  H=B,  (4-35b)  agrees  with  earlier  results.  The 
crosscorrelator  actually  performs  better  than  either 
the  SCOT  or  the  Roth  processor  for  the  particular 
example  case  in  Table  4-2.  In  general,  one  can  expect 
to  find  cases  for  particular  spectra  where  the  cross- 
correlator performs  worse  than  the  SCOT  or  Roth  processors. 

4B5.  Eckart  Filter 

The  Eckart  filter  derives  its  name  from  work 
in  this  area  done  by  Eckart  (1952).  Derivations  in 
Knapp  (1966),  and  Nuttall  and  Hyde  (1969),  are  outlined 
here  briefly  for  completeness.  The  Eckart  filter 
maximizes  the  deflection  criterion,  namely,  the  ratio 
of  the  change  in  mean  correlator  output  due  to  signal 
present  to  the  standard  deviation  of  correlator  output 
due  to  noise  alone.  For  long  averaging  time  T,  the 
deflection  has  been  shown  to  be 


r 00 

1 2 

L ; H (f)H2vf)G 

s 

L-oo 

] 

2 J 

00 

2 

2 

f H (f) 
^00 

H2(f) 

G (f)G  (f)df 

"2"2 

where  L is  a constant  proportional  to  T,  and  G (f) 

®1®2 

is  the  cross-power  spectrum  between  Sj^(t)  and  S2(t). 

For  the  model  (4-1)  G (f)=aG  ( f )exp(  j2irfD) . 

®1  2 ®1  1 

Application  of  Schwartz's  inequality  indicates  that 
H^(OH*(f)=Wg(f)e^j^^^® 


(4-37) 


i 
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maximizes  where 

ctG  ( f ) 

Wg(f)  = G (f)G  (f)  • 

Notice  that  the  weighting  (4-38),  referred  to 
as  the  Eckart  filter,  possesses  -jorne  of  the  qualities 
of  the  SCOT.  In  particular,  it  acts  to  suppress 
frequency  bands  of  high  noise,  as  does  the  SCOT.  Also 
note  that  the  Eckart  filter  unlike  the  PHAT  attaches 
zero  weight  to  bands  where  G (f)=0.  Tn  practice, 

the  Eckart  filter  reouires  knowledge  or  estimation  of 
the  signal  and  noise  spectra.  For  (4-1),  when  a=l  this 
can  be  accomplished  by  letting 


V')- 


The  variance  of  the  time  delay  estimate  using  Eckart 
filtering  is 


2 — r X ^x  X (l-C)df 

Var(D)  • --2-^-1 -i--2  .2 


r® 
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f f 
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ss 
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^l’‘2 

n G 

Hini  n2n2J 

For  the  example  case  in  Table  4-2, 

E ^ .^,.2.3  .. 

Var(D)=  ■'o^  ^4 

n f B „ 1 2 


21®  2 1 
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nil) 


that  is,  for  this  example  the  Eckart  filter  aehieve.' 
the  Cramer-Rao  lower  bound  (3-24).  In  general  this 
will  not  always  occur.  In  the  next  section  we  see 
that  (4-41b)  is  the  variance  achieved  by  the  ML 
processor.  This  might  be  expected  since  both  the 
Eckart  and  ML  processors  pass  nothing  in  the  signal 
frequency  band  (B,H)  and  both  have  constant  weighting 
over  the  band  (0,B).  Actually,  the  ML  estimator  is 
closely  related  to  the  Eckart  filter,  as  will  he  seen 
in  section  4C  of  this  chapter, 

4B6.  Maximum  Likelihood  Processor 
As  shown  in  Chapter  3 the  ML  processor  always 
has  minimum  variance.  For  the  Table  4-2  example,  the 
correct  weighting  from  (3-20)  is  W(f)=l/3  for  f£r(-B,B) 
and  zero  otherwise.  Now  from  (3-34) 


ML  / \ 
Var  (D) 


1-1 


(1-42) 


Thus,  the  minimum  variance  depends  on  a time  bandwidth 

2 

product,  TB  multiplied  by  the  bandwidth  squared,  B . 
Suppose  an  error  had  been  made  identifying  the  frequency 
band  of  the  signal.  Then  if  we  presumed  that  the 
weighting  was  W(f)  = l/3  for,  say,  fc(-aB,aB),  in  lieu  oi 
f (-B,B),  we  would  obtain  from  (3-33) 
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when  a^l 
Var(D)  = 


T,2b3 


which  reduces  to  (4-42)  when  a=l.  For  example,  in  this 
case,  a 10  percent  error  (that  Is,  a=l.l)  leads  Lo  more 
than  an  11  percent  Increase  in  variance.  If  a£l  then 
(3-33)  becomes 


Var(D)  = I TitV 


which  agrees  with  (4-42)  when  a=l.  Thus  a 10  percent 
error  (a=0.9)  leads  to  an  increase  in  variances  of 
37  percent.  Thus  our  example  suggests  it  may  be  more 
desirable  to  let  in  extra  noise  than  to  omit  signal 
power,  Finally,  if  our  error  led  to  processing  the  band 
fe(aB,B)  and  fe(-B,-aB),  we  would  obtain 


Var(D) 


1 8 _2  3 

7^  9 ® 

1-a 


which  agrees  with  (4-42)  when  a=0. 


The  ratio  of  variances  (4-45)  to  (4-42)  for 


a<<l  is 


If  we  again  err  by  10  percent  (i.e.,  a=0.1),  then  (4-46) 
yields  1.001  or  little  change  in  the  variance.  (This 
error  is  at  lower  frequencies  in  the  signal  band  and 
as  (3-33)  suggests,  proper  weighting  is  most  critical 
at  higher  frequencies.)  Thus,  for  this  example, 
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depending  on  how  we  make  a 10  percent  error  in  freqm'nc.y 
band  selection,  we  can  have  anywhere  from  0.1  percent 
lo  a 37.0  percent  increase  in  variance  of  the  time 
delay  estimate. 

4C.  Interpretation  of  Relationship  Between 
Correlation  Processors 

For  the  case  where  a=l 


^ (f)  ^ (f)+G„  „ (f) 


G (f ) 
®1®1 


®1®1 


1 . 


^i^r  ' "i"i 


(t)+G  (f) 

^2^2 


-°s  s <e( 
^11  i 


(4~47ci  ) 


s.,  s. 


(f) 


Gn  n n 

”l”l  "2^2 


1+ 


®1®1 


(f) 


% s 


(4-47b) 


G (f) 
"2"2 


"l"l 


(f) 


which  agrees  with  equation  (28)  of  MacDonald  and 

Schultheiss  (1969)  if  in  (4-47b)  G „ (f)=G„  „ (f) 

"l"l  "2"2 

For  low  SNR, 


G (f)  G (f) 

®1®1  ®1®1 
6— TTfT  G.  : (f)  ■ 


"l"! 


*^2"2 


it  follows  that 


s 

SjSi 


"l"l 


(f)G„  „ (fT 


"2*^2 


WE(f) 


(4-48) 


Notice  that  agreement  require.s  a = l. 
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that  is,  for  a=l  and  low  SNR,  tlie  ML  processor  is 
Identical  to  the  Eckart  filter.  Similarly,  for  low 
SNR, 


W^(f)  = 


"l"l 


^2^2 


(4-49) 


Therefore,  if  a=l. 


G (f ) 


i "l"l  "2^2 


W^(f)  . 


(4-50a) 


Furthermore,  for  G „ (f)=G  (f)=G  (f), 

"i"l  V2 


2 

G ... 
s^s.,  (f ) 

Wf'  ■ -G JT  - V*)  • 

Thus,  under  low  SNR  approximations  with  a*l,  both  the 
Eckart  and  ML  prefilters  can  be  interpreted  either  as 
SCOT  prewhitening  filters  with  additional  SNR  weighting 
or  PHAT  prewhitening  filters  with  additional  SNR  squared 
weighting. 

We  can  rewrite  (4-47)  as 


G (f) 

SfSi 

~G  TfT“ 

nn^ 


Wp(f) 


(4-50b) 


aI  G G 

1 njni  ngng 

/ G G /G  G |G  G 

V "l"l  "2^2  "l”l  "2^2  J "l"l  "2": 


(4-51) 


®1®1 


G 


"l"l 


"2"2 


for  uniformly  high  SNR, 
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lim 


yl 


®1®1 


^0 


"2*^2 


+G 


"l"l 


(‘1-52) 


that  Is,  giving  the  weighting  characteristics  similar 
to  the  SCOT  at  low  SNR.  Note  that,  like  the  ML  processor, 
the  PHAT  computes  a type  of  transformation  on 


G (f ) 

XiX  ' 

7? ^1-  exp 


J4>(f) 


(4-53) 


However,  the  ML  processor,  like  the  SCOT,  weights  the 
phase  according  to  the  strength  of  the  coherence.  From 
p.  379  of  Jenkins  and  Watts  (1968),  comparing  (B-22) 
with  equation  (9.2.19)  and  (9.2.20)  of  Jenkins  and  Watts 
(1968)  the  variance  of  the  phase  estimates  is  given  by 


Var  Mf)6  . 

where  N is  the  number  of  independent  FFTs  used  to 

A 

estimate  phase.  Notice  as  C-^1,  Var  ^ ->0.  Thus, 
(ML) 


(4-54) 


Var  (j)(f) 


(1-55) 


Comparison  of  (4-55)  with  (4-53)  reveals  that  the  ML 
estimator  is  the  PHAT  inversely  weighted  according  to 
the  variability  of  the  phase  estimates. 

The  ML  processor  has  been  compared  with  five 


other  candidate  processors  to  demonstrate  the  inter- 
relation of  all  six  estimation  techniques.  The 
derivation  ot  the  ML  delay  estimator  (in  Chapter  3), 
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together  with  its  relation  to  various  ad  hoc  techniques 
of  intuitive  appeal  (in  this  chapter),  suggests  the 
practical  significance  of  ML  processing  for  estimation 
of  time  delay  and,  thence,  bearing.  The  remainder  of 
this  thesis  deals  with  extensions  of  the  ML  processor 
to  more  complex  models  and  a discussion  of  the  results 
and  suggestions  for  future  work. 


CHAPTER  5 


MORE  COMPLEX  MODELS 

Chapter  3 answered,  for  a simple  model,  the 
fundamental  question  of  this  thesis:  What  is  the 

"best"  method  of  estimating  time  delay"  Chapter  4 
compared  this  method  with  several  other  candidate 
processors.  Chapter  5 considers  three  conceptually 
straightforward  extensions  of  the  problem  considered 
in  Chapter  3:  (1)  multiple  source  models,  (2)  moving 

source  models,  and  (3)  multiple  sensor  models.  The 
"solution"  to  these  problems  is  more  difficult  than 
the  problem  of  estimating  a single  time  delay  for  a 
stationary  source.  For  example,  in  the  multiple  source 
and  multiple  sensor  models,  there  is  more  than  one 
delay  to  be  estimated.  Indeed,  if  we  treated  multiple 
sources  and  multiple  sensors  together,  we  would  need  to 
estimate  a parameter  vector  for  each  source,  corres- 
ponding to  the  (relative)  delays  between  that  source 
and  each  sensor;  thus,  a (nonsnuare)  matrix  of  delays 
(comprised  of  a parameter  vector  for  each  source) 
would  need  to  be  estimated.  Finally,  it  is  necessary, 
in  effect,  to  estimate  the  motion  of  each  source  so  as 
to  be  able  to  Doppler  correct  the  received  signals 
prior  to  crosscorrelation.  Failure  to  apply  some  sort 
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of  Doppler  correction  will  cause  the  received  signals 
to  be  essentially  uncorrelated  even  if  a common  (but 
frequency  shifted)  signal  is  present. 

Both  notationally  and  analytically,  the  methods 
applied  to  estimate  the  unknown  parameters  become  more 
complex  than  the  methods  in  Chapter  3.  Yet  even  in 
Chapter  3 where  a "solution"  for  the  ML  estimate  of 
tim^>  delay  was  possible,  we  noted  that,  in  practice, 
it  would  be  necessary  to  resort  to  an  AML  estimation 
technique:  for  more  complex  models  there  is  no  reason 
to  expect  that  the  solution  will  become  simpler;  indeed, 
in  this  chapter  (especially  with  regard  to  moving 
sources),  we  appeal  more  to  aoproximate  and  ad  hoc 
techniques  based  on  the  ideas  of  Chapter  3 than  to 
rigorous  methodologies.  The  reasons  for  this  approach 
are  apparent  in  section  B and  have  to  do  with  the 
nonstationarities  introduced  by  the  source  motion. 

5A.  Multiple  Source  Models 
The  simplest  multiple  source  model  is  a two 
source  case  where  receiving  sensors  are  physically 
steered  at  one  source  and  the  second  source  acts  as  an 
interference.  Such  a model  is  depicted  in  Figure  5-1 
(Carter  and  Knapp  (1975)).  Mathematically, 

x^(t)  = s^(t)+S2(t)+n^(t)  (5-la) 

and 

Xo(t)  = Sj(t)+S2(t-D)+H2(t)  . 


(5-lb) 


Figure  5-1  Two  Directional  Source  Signals  Received  with  Noise 
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(The  effect  of  an  interfering  source  on  detection  is 

considered  by  Schultheiss  (1968).)  The  problem  is  to 

estimate  the  parameter  D.  In  effect  s^(t)  accounts  for 

correlated  noise  insofar  as  estimation  of  D is  concerned 

When  s^(t)  and  S2(t)  are  stationary  uncorrelated 

signals  with  power  spectra  G ^ (f)  and  (f)  and 

11  ^ ^ 


when  n^(t)  and  ng(t)  are  stationary  uncorrelated  noises 
with  the  same  power  spectrum  Gj^^(f),  it  has  been  shown 
by  Carter  and  Knapp  (1975)  that 

(f)  = 


^1^2 


-j2iTfDi 

1+  - .->--e 
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®1®1 


SiSi 


(f) 
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In  the  Special  case  when 


V2 


(f)=  i(i+e“^^^^^)=e"^’^^^cosTTfD 


(5-3) 


and 


C (f)=cos^TTfD  = J(l+cos2TrfD) . 
V - x_  ' ^ 


(5-4) 


^1^2 


Because  of  the  sinusoidal  oscillation  between  0 and  1 
of  C (f)i  the  Fourier  transform  of  (5-3)  will  exhibit 
a peak  &t  the  value  of  time  delay.  This  suggests  the 
usefulness  of  computing  the  Fourier  transform  of  the 
coherence  or  SCOT  (Carter,  Nuttall  and  Cable  (1973)). 

A more  general,  multiple  source,  two  sensor  model  is 


x-(t)  = Z s,(t)+n^(t) 

^ i 

X2(t)  = £ a.s^(t+D^)+n2(t)  . 


(.5- 5a) 


(5-5b) 
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The  limit  on  the  sum  depends  on  the  number  of  sources. 
Since  each  source  will  be  presiimed  to  be  independent  of 
the  others,  the  sources  will  be  mutually  uncorrelated. 

For  the  general  two  source  case  depicted  as  a multi-input, 
multtoutput  system  in  Figure  5-2,  it  follows  that 


x^(t)  = s^(t)■t■S2(t)-^nJ^(t) 

(5-6a) 

X2(t)  = aJ^s^(t■^DJ^)■^a2S2(t-^D2)+n2(t) 

( .5 -6b) 

and  therefore 

(5-7a) 

(5-7b) 

and 
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^1^2 


(f) 


= a G (f)e 
1 s^s^ 


+ G (f)  . 

"l"2 


-j2iTfDi 

-j2iTfD2 


(5-7c) 


However,  we  can  accommodate  coherent  noise  through  the 

inclusion  of  additj«.onal  sources  so  that  without  loss  of 

generality  G (f)=C  for  all  frequencies.  From  the 
"l"2 

two-source  model  with  incoherent  noise,  we  generalize  that 

X n (f)  (5-8a) 

*1*1  "l"l  i ®i®i 


(5-8b) 


and 


*1*2 


(f)  = Ect,G„  ^ (f)e 


i ^ ®i®i 


-j27rfD, 


(5-8c) 


Figure  5-2  General  Two  Source,  Two  Sensor  Model 
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In  the  ML  estimation  procedure  earlier  the  determinant 
of  could  be  ignored  since  it  did  not  depend  on  D. 
Now,  however,  for  the  two-source  model,  we  sey 
(suppressing  f)  that 


“A,., •'■"■'"■•■A, 


(5-9a) 


does  depend  on  (0^02).  For  example,  even  when 

G *G  *G  t Qi<“Qi»,“l  and  G “G  “G  , 
'*1*^1  12  ®1®1  ®2®2 


=(2G  +G  )^-G 
ss  nn  ss 

(5-9b) 

«4G  ^+2G  G +G  /-G  ^ [ 2+e"^ ^ ^^2~°1  ^+e'^^ ^ ^ °2'°1  • 

ss  ss  nn  nn  ss  j j 

(5-9c) 


In  general,  |Q|  depends  on  the  parameter  vector 
(Dj,D2>.  Thus,  we  must  be  concerned  by  the  |Q|  as  well 
as  the  exponent  in  (3-9),  for  the  multiple  source  model. 
Specifically,  we  want  to  maximize  the  sum  of  both  (3-17) 
and  the  lo8j,jQ|  term.  The  latter  is  Riven  by 
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(5-10) 


(5-11) 


In  practice,  and  X2  will  have  finite  bandwidth; 
therefore  the  limits  of  the  integral  (5-13)  will  also  be 
finite.  It  is  noteworthy  that  the  second  term  is  related 
to  the  definition  by  Shannon  (1949)  for  the  amount  of 
information  about  X2(t)  contained  in  x^^(t).  More 
specifically,  Gelfand  and  Yaglom  (1959)  and  Nettheim  (1966) 
have  shown  that  the  amount  of  information  about  x contained 
in  y (or  vice  versa)  is  given  by^ 

I = - ^ /log  (1-C  (f)]  df,  (5-14) 

xy  2 e xy 

where  the  limits  of  integration  are  over  the  nonzero 
range  of  the  integrand.  Hence,  for  C^y(f)=0,  there 
is  no  information  (in  the  linear  sense)'  contained  in  one 

^These  results  can  be  combined  with  (2-79)  for 
models  like  Figure  2-5  to  show  that  I is  the  integral 
of  the  logarithm  of  1 plus  received  signal  to  noise  ratio. 

2 

See  Carter  and  Knapp  (1975)  or  Chapter  2 for  a 
discussion  of  nonlinear  relations  which  can  yield 
Cj(y(f)=0  and  yet  y(t)  can  be  entirely  due  to  x(t).  as  for 
example,  when  y(t)=x^(t). 
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time  series  with  regard  to  the  other.  Alternatively,  if 

C (f)=l,  for  some  particular  f , then  there  is  an 
xy  p 

infinite  amount  of  information  about  x(t)  knowing  y(t) 

at  the  particular  freouency  fp.  More  generally,  for 

nonzero  C (f)<l,  the  amount  of  information  depends  on 
xy 

the  bandwidth  (limits  of  integration  in  (5-14))  and  the 
MSC  in  that  band. 

Thus,  following  (3-15)  and  (5-10)  through  (5-14), 
we  see  that  it  is  desired  to  maximize 
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(5-15) 


For  the  two  source  model. 
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(5-16) 


or  for  two-sensor,  multiple  source  model  we  maximize 
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« . (f)j:a,G„  ^ (f)e‘^J2’^^°i 
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lQ(f) 


df 


(5-17) 


Thus,  the  important  regions  of  the  estimated  cross 

spectrum  for  determining  are  these  frequency  bands 

where  G (f)  is  large.  However,  even  when  the  signal 
®i®i 

spectrum  is  strong,  if  the  intersource  interference  is 
such  that  the  intersensor  coherence  C (f)  is  low,  the 

XfX2 
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weight  attached  to  the  estimated  cross  spectrum  is 
degraded,  as  shown  above. 

While  we  can  estimate  auto  spectra  and  coherence 
between  sensors,  more  sophisticated  methods  must  be 
applied  In  order  to  estimate  the  source  signal  spectrum. 

The  mathematics  shows  how  to  process  for  known  signal 
spectrum.  In  the  communications  problem,  signal  spectrum 
will  generally  be  known,  although  a,  which  more  generally 
could  be  a function  of  frequency,  will  probably  not  be 
known.  In  other  problems,  methods  involving  classification 
and  data  bank  retrieval  need  to  be  studied.  In  the 
absence  of  a priori  knowledge,  we  might  assume  that 
every  frequency  band  where  the  coherence  was  high  was 
a different  source.  Tracking  (that  is,  estimating 
bearing  continuously)  for  each  frequency  band  then 
becomes  a classification  problem  where  the  number  of 
sources  is  ascertained  by  noting  the  number  of  clustered 
sources.  The  fewer  the  sources  for  a given  total  source 
power  the  easier  tracking  will  be.  However,  repeated 
clustering  analysis  will  be  desirable  to  ascertain 
whether  two  or  more  sources  are  being  classified  as  one. 

In  "real  world"  problems,  there  may  well  be  more 
than  one  source;  hence,  the  application  of  Chapter  3 
results  must  include  the  concepts  of  multiple  sources. 

There  are  other  concerns,  too,  in  the  practical 
application  of  our  Chapter  3 results.  The  next 
generalization  which  we  will  discuss  is  the  moving  source 
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problem. 


5B.  Moving  Source  Models 
The  model  we  shall  consider  is  a simplified 
one  characterized  by  the  observed  waveforms  (Carter 
and  Knapp  (1976b)) 

x^(t)=s(t  )+n^(t)  (5-18a; 

y2(t)=as(6t+D)+n2(t)  , (5-18b) 


where  s(t),  n^(t)  and  02(1)  are  zero  mean  jointly 
stationary  Gaussian  random  processes  which  are  mutually 
uncorrelate J.  The  problem  addressed  here  is  ML 
estimation  of  the  time  compression  and  delay  parameters 
8 and  D,  respectively;  the  problem  is  related  to  the 
Doppler  shift  work  by  Van  Trees  (1971).  The  character- 
istics of  the  signal  and  noise  are  such  that  x^{t)  is  a 
member  function  of  a zero  mean  stationary  Gaussian 
random  process.  Further,  despite  the  attenuation,  delay 
and  time  compression,  y«(t)  is  also  stationary  and 
Gaussian.  That  is,  both  autocorrelation  functions  given 
by 


\ X 
^1*1 


(5-19a) 


and 


depend  only  on  the  time  difference  f2“^l' 

However,  the  crosscorrelation  for  model  (5-18) 


(5-19b) 


depends  on  B as  follows; 
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Rjj  y (t j^.t2)=aE[a(t^)s(8t2+D))  =aRgg(t^-6t2-D) 
1 2 


As  required, 


(5-1 9c) 
(5-19d) 

(S-20) 


Notice  the  crosscorrelation  depends  on  0 as  well  at 
and  tg,  and  not  simply  the  difference  between  tj^  and  tj. 
Hence  the  processes  Xj^(t)  and  y2(t)  are  not  jointly 
second  order  stationary,  but  depend  on  the  absolute  tijne 
origin.  Thus,  the  introduction  of  time  compression  p in 
our  model  thereby  complicates  the  theory  through  the 
imposition  of  a second  order  nonstationarity . [For  a 
variety  of  practical  reasons,  we  desire  to  operate  on 
y2<t)  in  order  to  ensure  complete  stationarity.) 

An  ad  hoc  technique  for  estimating  h is  to 
operate  on  remove  (or  adjust)  the  tiire  scale 

change  6.  The  result,  referred  to  as  XgCt),  may  then 
be  used  with  Xj^(t)  in  the  usual  ML  estimator  of  Chapter 
3.  This  indeed  turns  out  to  be  the  ML  estimator  for  this 
problem  (as  is  subsequently  shown).  A major  problem, 
of  course,  is  that  8 as  we? 1 as  delay  D must  be  estimated 
to  undo  the  time  scaling  Introduced  by  motion  of  the 
source.  Suppose  8_ , for  example,  is  one  estimate  (or 

a 

hypothesis)  of  8 (like  x was  a hypothesized  delay  in 
Chapter  3)  and  let 

X2(t)  ^ y2(^/6a) 


(5-21a) 
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= as(Bt/6^+D)+n2(t/3^)  ■ (o-21h) 

Now  the  crosscorrelation  of  x^(t)  with  x^Ct)  is  given  by 

(5-22a) 


E[x^(t^)X2(t2)] 

'3. 


(5-22b: 


Thus,  for  6^=6.  we  see  that  x^(t)  and  X2(t)  are  second 
order  jointly  stationary,  for  then  R (t.  ,t„)  depends 

X1X2  J ^ 

02  ' y on  the  time  difference  T=t--t«.  For  B =3,  it  is 

1 ^ ci 

possible  to  compute  a single  Fourier  transformation 
^ T to  achieve 


12  -00  12 


(5-23a) 


=aG  (f  , 

ss'  ' 


(5-23b) 


Similar  results  can  be  obtained  using  the  ci.ncept  of 
locally  stationary  random  processes  (Silverman  (1957)). 
However,  in  general,  when  B/6  , a two-dimensional 
Fourier  transformation  must  be  performed.  For  convenience 
let  B“6/6  (where  we  ultimately  hope  to  make  3=1  by 

Si 

proper  choice  of  6 )^;  then  it  follows  that 

Si 


E (k)X2(  1 )j  ■ 

o o 


(5-24a) 


e->’o(kti-lt2) 


In  the  following  it  may  be  assumed  that  B^=l 
and  3=6  ; that  is,  that  V2(t)  has  not  been  preprocessed. 
Results  can  then  be  applied  with  6=1  (rather  than  3=1); 
for  m.any  problems  6=1. 


124 


i 


(w-ko).  )T/2  gT/2(u)-lw^/g) 
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Equation  (5-24)  offers  a more  rigorous  interpretation  of 
(5-23).  For  large  T and  S near  unity,  it  follows  from 
(5-24f)  (since  the  discrepancy  between  the  sine  functions 
is  minor)  that 


^ (f)  = T E(Xi(k)X2(l)l 
1 2 


, l=kS 
, Ij^kB  . 


(5-25a) 


(5-25b) 


Also . 


E(Xi(k)Xi(l)l- 

,lfk 


(5-25C) 


and 


T E[X2(k)X2(D] 


ko). 

a n2H2  a A'  g ss  g ’ 

0 ,l^k' 


(5-25d) 


Note  in  (5-25d)  G„  „ is  evaluated  &x  B ku.  not  kcu, . 

02*12  a A A 

Similarly,  it  can  be  shown  for  B=1  and  large  T,  that 


E[X2(k)Xj(l)] 


aGgs(ka)A)e’^^‘^A°  l=k/6 


0 


l?‘k/B  . 


(5-26) 


We  now  proceed  as  in  Chapter  3,  Section  A.  In  particular, 
we  desire  to  maximize  a total  award  function  J^,  as 
depicted  in  Figure  5-3,  through  the  adjustment  of 
hypothesized  compression  B.  and  hypothesized  delay  x; 
when  is  maximized,  the  ML  estimates  B and  D depicted 
in  Figure  5-3  are  achieved. 

It  is  important  to  the  discussion  that  follows  to 
note  that  if  B.  is  incorrectly  selected  such  that  B is 


gure  5-3  Processing  to  Estimate 
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much  different  from  unity,  the  processes  x^(t)  and  x,,(t) 
are  second  order  Jointly  nonstationary  and  the  estimators 
are  not  ML  estimators.  However,  once  we  have  begun  to 
estimate  delay  and  compression  correctly,  the  processor 
is  an  ML  estimator;  that  is,  in  the  sequential  estimation 
problem  where  several  observation  intervals  are  available, 
then  ML  or  at  least  AML  estimation  is  possible  in  the  last 
intervals.  Before  proceeding,  we  also  note  that  if 
$5^1  any  crosscorrelation  (coherence)  terms  in  the  award 
will  be  zero.  More  specifically,  if  g is  much 
different  from  unity,  then  time  delay  cannot  be  estimated 
without  some  type  of  Doppler  or  time  compression 
preprocessing.  The  importance  of  this  statement  is  that 
Chapter  3 cannot  be  applied  to  estimate  bearing  to  moving 
sources  which  are  nearfield  (relative  to  the  sensor 
separation)  unless  time  compression  preprocessing  is  done. 
Denote  the  Fourier  coefficients  of  x^(t)  and  Xg(t)  as  in 
Chapter  3.  The  2N+1  vectors  X(k)  = [x.j^(k)  ,X2(Pk)]  ' ,k=  -N, 
-N+1,...,N  for  6=1,  are  uncorrelated  Gaussian  (hence, 
independent)  random  variables.  More  explicitly,  because 
of  the  independence,  the  pdf  for 

Xe{X^(-N),X2(-N6)}'  ,{X^(-N+l),X2l(-N+l)B]  )'.  . . .f  X^CN)  . 

X2(NB)}  ' 

given  the  true  values  of  attenuation  a,  delay  D and  time 
compression  6 (actually  we  also  are  given  B hence  are 

Si 

"given"  6=6/6„  ) is  the  product  of  the  individual 

Si 


densities . 
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Specifically  when  B =1  and  P=1  the  pdf  of  x is 

& 


p(X|a,B,D)  = n [h.exp(-  |j^)] 
k=-N  ^ ^ ^ 


(5-27a) 


where 


Jfc  •=  T[x*(k)x*(k)lQ^"^  (ku)^) 


Xj^(k) 

XgOt) 


and 


-1 


\ =[(2n)|Qx(ku)^)ri  , 


(5-27b) 


(5-27C) 


and  Qjj(^)  is  the  power  spectral  density  matrix  between 
the  random  procefises  x^(t)  and  XgCt). 

For  ML  estimation,  it  is  desired  to  simultaneously 


choose  as  6 and  B those  values  which  maximize  the  pdf 
evaluated  for  hypothesized  compression  B.  and  hypothesized 

a 

A A 

delay  t.  Equivalently,  B and  D are  selected  to  maximize 
any  monotonically  increasing  transformation  of  the  pdf. 

A A 

Hence,  B and  D are  selected  to  maximize  the  log  pdf, 
namely, 


N N 

= In  p(X|a,B,D)  Llnhj^-  | Jj^ 

k=-N  k=--N 


(5-28) 


While  the  derivation  provides  sufficient  information  on 
estimating  the  parameters  B and  D,  it  is  valuable  to 
interpret  (5-28)  in  order  to  understand  both  its  meaning 
and  its  implementation.  The  award  to  be  maximized  (5-28) 
can  be  written  (assuming  large  T)  as  three  terms 
substituting  (5-14)  and  (3-15) 


1 


VVz'-JIVl  -IV2l'l-=12> 

df  . (5-29) 

Unlike  Chapter  3,  C^2  depends  on  6-  Equation  (5-29) 
is  difficult  to  interpret;  it  is  comprised  of  three  terms. 
For  ML  estimation  (versus  AML  estimation),  only  the  last 
two  terms  of  (5-29)  depend  on  the  data.  However,  the 
parameters  8 and  D appear  in  all  three  terms  of  (5-29); 
hence,  all  three  terms  must  be  considered.  The  first 
term  of  (5-29)  is  small  with  respect  to  the  second  term 
(because,  from  (5-14),  the  information  has  a logarithm 
in  it);  also,  the  first  term  of  (5-29)  is  small  with 
respect  to  the  third  term.  Hence,  we  might  expect  that 
the  first  term  can  be  ignored.  However,  under  some 
common  degenerate  cases  (specifically,  r^D  and  T very 
large)  the  sum  of  the  second  and  third  terms  does  not 
depend  on  the  parsuneters  B and  D.  For  example,  for  t=d  and 

^ O'  ^ 4-1  OanH  A kIO  |,i~J2TrfD 


00  G G 

1 *1*1 

-/  i + 77 


X X 1 ”^*1*2  ^12 

*2*2  1 df+/  T7.— — 1-^ 

tr—n — *1  _ b.  ..  r 1 r 


very  large  T,  6 ^ „ , i*l,2and 

*i*i  *Ti 


*1*2  *1*2 


and  the  sum  of  the  last  two  terms  of  (5-29)  becomes 
® 1-C 

-/  -j-  ydf , which  is  a constant.  This  situation  is 

perplexing  since  the  remaining  tenri  in  (5-29)  (namely, 
the  information  (5-14))  does  not  depend  on  the  data,  but 
only  on  the  (assumed  known)  statistics  of  the  data.  It 
is  interesting  that  when  this  is  the  case  and  when  we 
apply  AML  techniques  (that  is,  we  use  estimated  data 
statistics  for  assumed  known  statistics),  the  data  do 
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appear  ii  ths  expression  for  the  information. 

Finally,  we  notice  if  as  a suboptimum  technique, 
we  were  to  take  the  first  or  last  term  in  (5-29)  and 
simply  maxiaiize  it,  that  to  do  so  would  require  adjusting 
the  parameter  estimates  so  as  to  attempt  to  increase  the 
coherence  across  the  entire  frequency  band;  the  second 
term  of  (5-29)  does  just  the  opposite.  Notice  when  the 
time  compression  is  estimated  incorrectly, 
only  the  information  needed  to  estimate 

compression.  Having  estimated  compression  correctly, 
only  the  last  term  of  (5-29)  is  needed  to  estimate  delay. 
This  suggests  a suboptimum  ad  hoc  technique  for  estimating 
6 and  D,  namely,  maximize  the  information  to  estimate 


6 then  use  that  6 to  estimate  D with  the  award  function 
of  Chapter  3.  In  practice,  this  subcptimum  technique 
should  compare  favorably  with  maximizing  (5-29),  since 
there  are  a number  of  assumptions  and  approximations 
leading  to  the  award  function  (5-29);  most  notably, 

(5-29)  presumes  6=1  so  that  joint  second  order  stationarity 
holds.  When  this  is  not  the  case,  maximizing  (5-29) 
becomes  simply  an  advisable  but  ad  hoc  estimation 
procedure . 

There  are  some  degenerate  cases  of  the  model 
(5-18)  that  are  easier  to  work  with  analytically  (namely, 

D known  and  equal  to  zero,  n2(t)=0  and  a=l).  Such  models 
have  rather  predictable  results  (namely,  the  cross- 
correlation terms  are  important  except  as  G (!)-♦«>, 

n<  n< 


131 


that  is,  as  one  of  the  observation  channels  becomes 
noise  dominated;  in  the  later  case,  the  hypothesized 
time  compression  attempts  to  align  the  estimated  auto 
spectrum  with  the  (known)  signal  spectrum).  Thus,  the 
degenerate  cases  do  not  add  insight  into  the  fundamental 
issue  of  stationarity . We  are  thus  led  to  state  that 
maximizing  (5-29)  (or  first  (5-14)  and  then  the  last 
term  of  (5-29))  by  choice  of  3 and  6 (respectively)  is 
merely  an  intuitively  appealing  ad  hoc  technique. 

5C.  Multiple  Sensor  Models 

The  problem  we  address  here  is  estimation  of  a 
parameter  vector  D from  a set  of  sensors  with  received 
voltages 

x^(t)  = a^s(t+D^)+n^(t)  i=l,2...  . (5-30) 

Although  the  notation  for  is  the  same  as  Section  A, 
this  model  should  not  be  confused  with  a multiple 
source  model,  since  this  model  is  only  one  source  but 
many  sensors.  To  extend  the  problem  to  many  moving 
sources  received  at  many  sensors  requires  that 

' Pk  (5-31) 

In  the  model  (5-30),  we  assume  (without  loss  of 
generality)  that  oij*l  and  Dj^*0;  thus 

Xj^(t)  = s(t)+Uj^(t)  (5-32) 

X2(t)  = a2s(t+D2)+n2(t) 


Xm(t)  = anjS(t+Dj^)+Uj^(t) 
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and  we  desire  to  estimate  the  M-1  dimension  relative 
delay  vector  (Dg-D^^,  Dg-D^^,  . . . . 

The  general  solution  to  this  problem  is  simply 
an  extension  of  the  alternate  realization  ir  Chapter  3, 
Section  3C.  In  particular,  the  steering  vector  is  now 


-j2TTfD2^  _^-j2TTfDMj 


V » [1,  0|^ 

For  uncorrelated  noises 

Q ■ dlagT^G  ] 

^n  n.n.  ^ 

1 Is 

The  IxM  vector  filter  is  given  by 


-1 


% y 


ss 


(5-33) 


(S-34) 


(5-35) 


Hence,  the  generalization  is  realized  by  extending 
Figure  3-10  to  M prefilters  with  one  at  each  sensor 
location  as  shown  in  Figure  5-4.  A more  explicit 
realization  is  given  in  Figure  5-5,  which  is  the  extension 
of  Figure  3-11. 


Figure  5-4  Multiple  Sensor  Estimation  of  Delay  Vect 


Multiple  Sensor  Processor  for  Estimating  Delay  Vector 


CHAPTER  6 


DISCUSSION 

6A.  Applications  and  Summary 
The  purpose  of  this  section  is  to  briefly 
summarize  and  discuss  the  applications  of  this  work. 

Most  of  the  applications  are  intimately  tied  to  the 
theoretical  results  already  presented  which  are  summarized 
in  the  subsequent  paragraphs.  The  primary  purpose  of  this 
section  is  to  highlight  applications  of  the  theory  with 
a minimum  of  reliance  on  mathematical  notation.  There 
are  three  main  applications  for  the  theory  of  time  delay 
estimation  discussed  in  the  following  three  subsections. 
First,  it  is  a useful  vehicle  for  parameter  identification. 
Second,  we  can  use  it  to  obtain  bearing  estimates. 

Finally,  under  certain  conditions  we  can  estimate  source 
position.  These  applications  rely  on  the  theory 
developed  in  the  preceding  text,  which  is  summarized  in 
the  following  two  paragraphs. 

This  dissertation  has  investigated  methodologies 
for  passive  estimation  of  the  rearing  to  a slowly 
moving  acoustically  radiating  source.  As  demonstrated, 
the  mathematics  for  the  solution  to  this  problem  is 
analogous  to  estimating  the  time  delay  between  two 
time  series.  Because  the  estimation  of  time  delay  is 
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closely  related  to  the  co!’erence  between  two  time 
series  an  extensive  investigation  of  coherence  has  bor^n 
presented.  New  results  on  using  coherence  to  provide 
information  about  linear  and  nonlinear  systems  have 
been  presented  and  proved. 

The  ML  estimate  of  time  delay  (under  jointly 
stationary  Gaussian  assumptions)  has  been  derived. 

The  65xplieit  dependence  of  the  time  delay  estimate 
coherence  is  evident  in  the  estimator  realization  if 
which  the  two  time  series  are  prefiltered  (to  accenti-ado 
frequency  bands  according  to  the  strength  of  the 
coherence)  and  subsequently  crosscorrelated . The 
hypothesized  delay  at  which  the  GCC  function  peaks  is 
time  delay  estimate.  From  the  GCC  realization  the 
variance  of  the  time  delay  estimate  has  been  obtained. 

By  use  of  a different  interpretation  of  the  MI.  estirr.)  or 
derivation,  other  realizations  have  been  obtained.  The 
GCC  realization  with  ML  weighting  is  compared  to  several 
other  proposed  weightings.  The  estimation  formulation 
has  been  extended  to  three  important  generalizations: 
multiple  sources,  moving  source  and  multiple  sensors. 
Nonstatlonarities  introduced  as  a result  of  source  motion 
are  studied.  These  results  can  now  be  applied  to  three 
problem  areasof  interest. 

6A1,  Parameter  Identification 

In  the  system  identification  problem  we  are  given 
a system  with  unknown  r.sscription . We  design  a probe 
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to  excite  the  system  and  ensure  that  the  probe  is 
sufficiently  rich  in  frequency  content  (G  (f)>0, 

A.  A 

Then  we  simultaneously  observe  (perhaps 
recoi' ) the  probe  (input)  and  response  (output)  of 
the  system.  The  objective  of  these  observations  is 
to  characterize  the  system.  In  Chapter  2 it  has  been 
shown  that  there  exists  a linear  filter  which  will 
characterize  the  system  if  the  MSC  is  unity  at  all 
frequencies.  (Appendix  C provides  a computer  program 
for  estimating  MSC  between  two  waveforms  (input  and 
output).)  When  the  MSC  is  not  unity,  the  characteriza- 
tion is  considerably  more  complex.  We  have  looked  at 
certain  no  memory  nonlinearities  and  shown  how  they  can 
be  characterized  by  orthogonal  polynomial  expansions. 

The  main  thrust  of  the  dissertation,  however, 
has  been  to  estimate  one  parameter  (delay)  when  the 
system  is  linear,  but  the  observations  are  corrupted 
by  noise.  Proper  estimation  of  just  this  one  parameter 
requires  knowledge  of  the  magnitude  transfer  function 
a (or  more  generally  la(^)l).  finally  knowledge 
of  the  noise  spectral  densities.  When  this  a priori 
knowledge  is  not  available.,  we  have  proposed  estimating 
the  unknown  quantities  and  substituting  them  in  place 
of  the  known  quantities.  There  is  no  rigorous 
derivation  to  support  this  procedure  other  than  to  note 
th.at  as  the  observation  time  becomes  large  the  estimated 
quantities  converge  to  the  true  ones.  Thus,  the 
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methodologies  applied  to  the  time  delay  estimations  can 
be  expected  to  be  even  more  complex  if,  for  example, 
the  filter  output  were  X2(t)=aj^S(t+D^)+a2S(t+D2)+n2(t) . 
More  generally,  if  XgCt)  was  the  output  of  an  FIR 
digital  filter  of  unknown  order  then  the  problem  of 
estimating  the  order,  the  delays  and  the  attenuations 
(see  Hannan  and  Thomson  (1971),  Hannan  and  Robinson 
(1973)  and  Carter  and  Knapp  (1976a))  is  a more  general 
problem  than  the  one  addressed  here.  However,  to  solve 
the  bearing  estimation  problem  motivating  this  research, 
the  added  generality  is  not  required.  Thus,  the  problem 
considered  here  is  only  a subset  of  the  parameter 
identification  problem.  Further,  note  that  the  solution 
to  the  time  delay  estimation  problem  does  not  involve 
the  Fourier  transform  of  the  optimum  Wiener-Hopf  filter 
(Roth  processor),  which  maps  Xj^(t)  closest  to  X2(t); 
that  is,  the  technique  does  not  look  at  peaks  or 
midpoint  of  the  impulse  response  of  the  filter  that 
in  the  MMSE  sense  filters  x^(t)  to  obtain  an  optimum 
X2(t).  With  these  comments  in  mind,  we  have  generalized 
our  model  to  an  Important  class  of  nonstatlonarities 
in  order  to  estimate  bearing. 

6A2.  Bearing  Estimation 
The  bearing  estimate  follows  directly  from  the 
delay  estimate  according  to  the  simple  arccos  trans- 
formation (3-2).  The  range  does  not  need  to  be  too 
great  relative  to  the  sensor  separation  in  order  for  the 
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angle  that  the  hyperbola  asymptote  makes  with  the 
baseline  to  accurately  represent  the  source  beaT'ing. 

For  stationary  sources  or  closely  spaced  sensors,  the 
relative  Doppler  (or  more  generally,  the  time 
compression)  can  be  ignored.  However,  to  apply 
these  techniques  to  widely  separated  sensors  and 
moving  sources,  it  is  necessary  to  process  the  data 
in  order  to  perform  Doppler  correction  (that  is,  a 
time  scale  correction  or  time  scale  expansion).  To 
ignore  this  processing  would  result  in  an  apparent 
uncorrelated  behavior  between  the  two  received  waveforms. 
One  contribution  of  this  work  has  been  to  specify  an 
ML  estimate  of  time  compression.  However,  because  of 
the  nonstationarity  of  the  processes  involved,  the 
results  tend  to  be  more  heuristic  and  more  difficult 
to  interpret  (and  implement)  than  those  for  the  time 
delay  estimation  problem.  In  fact,  the  implementation 
is  hindered  by  practical  computational  issues  of  achievinii 
the  time  compression.  Nevertheless,  in  the  future  as 
computational  methods  allow  for  broadband  time 
compression,  the  methods  hypothesized  heie  could  actually 
be  tested  in  practical  environments.  This  should  not 
be  interpreted  to  mean  that  time  compression  cannot 
currently  be  accomplished.  Exact  time  compression  can 
be  achieved,  as  for  example,  with  variable  speed  tape 
recorders  or  with  exact  DFT’s.  Approximate  time 
compression  can  also  be  achieved  through  complex  inter- 
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polation  of  FFT  points  or  nearest  FFT  bin  approaches. 

In  practice,  all  of  these  techniques  are  expensive 
to  implement;  hence,  any  production  application  of  the 
theory  will  benefit  from  advances  in  methodologies 
and  mechanizations  for  achieving  time  compression. 

Having  techniques  for  estimating  the  bearing  to  moving 
acoustic  sources,  we  can  extend  the  applications  of  our 
theory  to  estimating  range. 

. 6A3,  Passive  Ranging 

In  the  two  sensor  models,  we  are  able  to  estimate 
delay  from  which  we  can  estimate  bearing.  In  the 
multiple  sensor  situation  more  information  is  available. 
Indeed,  with  three  sensors  we  can  also  estimate  source 
location.  For  example,  in  Figure  6-1  three  equispaced 
colUnear  sensors  are  depicted.  As  indicated  in 
section  5C,  the  estimate  of  simultaneously 

processing  data  from  all  three  sensors  (one  suboptimum 
processor  would  be  to  estimate  each  bearing  from 
generalized  crosscorrelations  between  only  two  sensors). 
When  the  sensor-pair  midpoints  are  separated  by  distance 
d (meters),  the  range  (meters)  to  the  source  is  given  by 

dsin0„ 

^ * “sin(0j-02)  ■ 

An  estimated  range  is  obtained  by  inserting  estimated 
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d 


Figure  6-1  Three  CoUlnear  Sensors,  Single  Source 
Passive  Ranging  Geometry 
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bearings  in  (6-1).^  The  asymptotes  depicted  in 
Figure  6-1  are  upper  bounds  (biased  estimates  of 
hyperbolic  LOP's);  hence,  the  actual  source  location 
will  be  slightly  "below"  the  intersection  depicted. 

For  R>>d,  the  bias  will  not  be  a practical  concern. 

For  more  complicated  sensor  geometries  (see 
Figure  6-2),  the  bearings  0^  and  Gg  are  used  to  obtain 
effective  bearings  0^®  and  When  the  sensor 

geometry  is  known,  the  effective  bearings  are  easily 
obtained  by  the  addition  of  a correction  term  to  the 
observed  bearing.  Similarly,  the  effective  separation 
dg  is  simply  the  shortest  distance  between  the  midpoints 
of  the  sensor  pairs  (1,2)  and  (2,3).  The  range  estimate 
is  then  obtained  by  substituting  effective  measurements 
into  (6-1).  When  four  or  more  sensors  are  used  to 
estimate  three  or  more  LOP's,  source  position  may  be 
ambiguously  specified,  as  shown  by  points  A,  B,  C in 
Figure  6-3.  In  such  a case,  it  is  reasonable  to  presume 
that  the  source  is  the  least  squares  distance  from 
existing  LOP's;  although  it  is  possible  for  two  or 
three  sources  to  be  present. 


^The  estimated  position  (range  and  bearing,  in 
polar  coordinates)  obtained  by  substituting  ML 
estimates  of  the  bearings  into  (6-1)  is  not  necessarily 
the  ML  estimate  of  position. 


Figure  6-3  Three  Estimated  LOPs  to  On 
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6B.  Suggestions  for  Future  Work 

This  section  suggests  four  areas  for  future 
work.  In  a sense,  It  provides  an  insight  into  what 
we  still  do  not  know  about  the  problem  at  hand.  Or 
stated  differently,  having  solved  the  problem  we  set 
out  to  solve,  we  now  understand  how  to  nose  new  problems 
which  we  have  uncovered.  First,  in  the  parameter 
identification  area  there  appear  to  be  several  fruitful 
research  questions:  How  to  identify  parameters  for 

(1)  general  (or  particular)  nonlinear  systems,  (2) 
multi-input,  multi-output  linear  systems,  (3)  general 
linear  systems,  and  finally  (4)  "real  world"  socio- 
economic systems.  The  complexity  of  estimating  time 
delay  suggests  that  the  solution  to  these  problems  will 
be  more  complex. 

The  second  area  is  verification  of  the  theory 
by  simulation.  We  have  already  conducted  one  costly 
computer  experiment  (Appendix  D)  which  substantiates 
our  belief  that  insertion  of  estimated  spectra  for 
true  spectra  enhances  the  estimation  of  time  delay. 
However,  without  running  many  such  experiments,  we  have 
no  statistical  argument  to  substantiate  the  theory. 
Because  the  cost  of  running  this  analysis  is  prohibitive 
on  a large  scale,  digital  computer,  special  purpose 
FFT  hardware  should  be  used  to  empirically  validate  the 
theory.  The  cost  of  such  a system  will  be  significant. 
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The  third  area  of  investiBation  is  an  extension 
of  the  theory  to  sequential  estimation.  In  practice, 
our  observation  interval  will  not  be  just  T seconds; 
rather  there  will  be  several  consecutive  periods  of 
T seconds.  Knowing  that  the  source  is  constrained  in 
its  rate  of  speed,  we  should  be  able  to  rule  out 
certain  ambiguous  estimates  of  delay  (bearing).  More 
generally,  we  could  model  the  ship’s  track  and  use 
Kalman  filter  techniques  to  extrapolate  best  projected 
position  (bearing)  based  on  the  filter  outputs. 

Finally,  the  theory  presumes  a great  deal  about 
(1)  ocean  acting  as  a linear  time  Invariant  filter  over 
the  observation  period  T,  (2)  the  characteristics  of  the 
noise,  and  (3)  the  source  motion.  Thus,  the  true 
engineering  test  is  to  make  controlled  measurements  with 
actual  acoustic  sources  in  the  ocean  in  order  to  test 
the  hypothesis.  Based  on  what  we  currently  know,  there 
is  every  reason  to  believe  such  an  endeavor  will  be 


successful . 


i 
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APPENDIX  A 

TECHNIQUES  FOR  SPECTRAL  ESTIMATION 
The  basic  objective  of  this  appendix  is  to 
briefly  describe  two  (similar)  techniques  used  to 
estimate  the  elements  of  the  power  spectral  density 
matrix.  The  estimates  obtained  are  then  used  to  form 
an  AML  estimate  of  time  delay.  The  two  techniques  are 
the  overlapped  FFT  technique  (discussed  by  Carter, 

Knapp,  and  Nuttall  (1973a))  and  the  Chirp-Z  transform 
(CZT)  technique  (discussed  by  Carter  and  Knapp  (1975)). 

The  methods  discussed  are  sometimes  referred  to  as 
direct  methods  (as  opposed  to  indirect  correlation 
methods)  and  have  been  discussed  in  part  by  Knapp  (1966), 
Welch  (1967),  Bingham,  Godfrey  and  Tukey  (1967),  Benignus 
(1969a),  Nuttall  (1971),  Williams  (1971),  and  Rabiner  and 
Rader  (1972). 

Both  methods  begin  with  two  (one  from  each  process) 
digital  waveforms  (or  with  analog  wavefonns  that  have 
been  lowpass  filtered  and  digitized).  Briefly,  there 
are  four  steps  in  the  estimation  procedure:  First,  each 

time  series  is  segmented  into  N segments,  each  having 
P-data  points.  Second,  each  segment  is  multiplied  by 
a smooth  weighting  function.  Third,  the  Z transform  of 
the  weighted  P-point  sequence  is  evaluated  on  the  unit 
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circle  in  the  Z-plane.  Finally,  the  Fourier  coef f iclen Is 
thus  obtained  are  used  to  estimate  the  elements  of  the 
power  spectral  density  matrix  by  averaging  "raw”  power 
spectral  estimates  over  all  the  N segments.  The  two 
methods  of  spectral  estimation  differ  in  how  the  Z transform 
is  evaluated.  One  method  uses  the  FFT;  the  other  uses  the 
Partitioned  and  M'^dlfied  CZT  (pam-CZT). 

More  explicitly,  two  random  processes  that  are 
jointly  stationary  over  N data  segments  are  processed 
as  follows  (Carter  and  Knapp  (1975)): 

1.  Each  of  the  two  time  series  is  segmented 
into  N segments  of  P points.  The  segments  may  either 

be  disjoint  or  overlapped.  Then  one  segment  of  P data 
points  with  the  same  time  origin  is  selected  from  each 
of  two  time  records.  Even  if  each  of  the  N data  segments 
is  large  (for  example,  greater  than  4096),  P should  be 
selected  to  ensure  that  the  sampling  frequency  divided 
by  P will  afford  adequate  spectral  resolution. 

2.  Each  of  the  two  P point  segments  is 
multiplied  by  ^ smooth  weighting  function.  Here  smooth 
means  that  the  t-th  order  derivative  is  continuous  over 
the  full  interval  of  data  points,  for  t=0,  1,  2,  ...  up 
to  some  reasonable  limit.  The  smoother  the  weighting 
function,  the  more  rapidly  the  side  lobes  of  its  Fourier 
transform,  or  window  function,  will  decay.  The  more 
impulse-like  the  window,  the  less  leakage  there  will  be 
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Henco , good  weighting  functions  result  in  better  spectral 
estimates.  The  price  paid  for  impulse-like  window 
functions  with  rapidly  decaying  side  lobes  is  a wider 
main  lobe,  that  is,  poorer  frequency  resolution  when 
P is  held  fixed.  Tf  better  resolution  is  desired,  more 
data  points  per  segment  will  be  required.  This  in  turn 
requires  both  that  the  data  be  available  and  that  tney  can 
be  efficitiitly  prcccGCcd.  Moreover,  from  ?.  stability 
point  of  view,  increasing  P decreases  the  available 
number  of  independent  data  segments  when  the  data  duration 
is  finite. 

The  specific  selection  of  a weighting  function 
involves  a number  of  tradeoffs.  A commonly  used  weighting 
(or  windowing)  function  is  the  cosine  (Hanning)  function 
defined  at  the  p-th  instant  in  the  interval  (0,P)  ay 


such  a function  starts  out  at  zero  for  p=0  smoothly  rises 
to  unity  by  p*P/2  and  smoothly  decays  to  zero  at  p=P. 

The  application  of  a cosine-weighting  function, 
which  is  necessary  to  reduce  errors  due  to  side  lobe 
leakage,  has  the  disadvantage  of  apparently  wasting  the 
vailable  <ita.  This  apparent  wastage  can  be  overcome 
through  overlapped  processing.  In  particular,  Nuttall 
(1971)  has  shown  that  the  same  stability  (as  measured  by 
the  number  of  equivalent  degrees  of  freedom)  can  be 
obtained  from  a fixed  amount  of  data  via  overlapped 
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processing  as  with  Hlackman  and  Tuk^y  (1958)  correlation 
processing  for  both  auto  and  cross  spectral  density 

estimation.  (Results  on  cross  spectra  processing 

I 

followed  in  a supplemental  report.) 

1 

' Quite  naturally,  there  is  an  increase  in 

I computational  cost  associated  with  overlapped  processing. 

I Specifically,  the  number  of  FFTs  to  be  performed  (a 

measure  ot  the  computational  cost)  increases  with  the 
percent  overlap  specified.  For  example,  the  number  of 
FFTs  reqviired  for  50-percent  overlap  is  approximately 
twice  the  number  for  0-percent  overlap.  Increasing  the 
I overlap  from  50-percent  to  62.5  percent  requires 

j 32-percent  more  FFTs.  For  Hanning  weighting,  the 

improvement  to  be  derived  from  using  62.5-percent  overlap, 
as  opposed  to  50-percent  overlap,  will  not  usually 
j warrant  the  increased  computational  costs  (Carter,  Knapp, 

and  Nuttall  (1973a)). 

\ Note  that  if  there  is  no  overlap,  each  segment 

1 

* would  be  virtually  independent  of  the  previous  one 

(except  for  correlated  edge  effects).  Independent  data 
segments  facilitate  certain  analytic  computations.  Hence, 
all  theoretical  results  here  are  concerned  with  the  case 
of  independent  segments;  that  is,  no  overlap.  This  is 
true  even  though  overlapped  processing  is  recommended 
j for  actual  data  processing.  The  aunount  of  overlap 

desirable  can  be  predicted  by  picturing  the  apparent 
wastage  for  a specific  weighting. 


I 


ill 
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3.  The  transform  of  the  weighted  P-point 
sequence  is  evaluated  on  the  unit  circle  in  th^  z plane 
The  two  sided  Z-transform  of  an  infinite  sequence  is 
defined  by  Gold  and  Rader  (1969)  and  Oppenheim  and 
Schafer  (1975)  as 


X (z)  = Z X (p)z"P,  n=l,2,...,N  , 


(A-1) 


P=_oo 

where  z equals  any  complex  variable. 

Similarly,  Y^(J^)  is  defined  as  the  Z-transform 
of  YjjCp)*  When  x^(p),  yj^(P)  are  finite  in  duration,  the 
infinite  series  (A-1)  becomes  finite.  Evaluation  of  the 
Z-transform  at  P equally  spaced  points  around  the  circle 


yields  the  DFT: 

P-1 


,(k)  = E X . 


(A-.) 


p=0 

Similarly,  Y^(k)  is  the  DFT  of  the  n-th  weighted  data 

segment  yjj(p),  p«0, 1 , . . . ,P-1 . The  DFT  can  rapidly  be 

evaluated  by  two  methods:  the  Cooley-Tukey  (1965)  or  the 

PAM-CZT  (see, for  example,  Rabiner,  Schafer,  and  Rader 

(1969),  Schilling  (1972),  Ferrie,  Nawrocki,  and  Carter 

(1973),  and  Carter  and  Knapp  (1975)).  The  FFT  is  a fast 

algorithm  for  evaluating  the  DFT.  If  the  DFT,  (A-2), 

is  evaluated  for  P frequencies  (k=0, 1 , . . . ,P-1)  it  requires 
2 

P (complex)  multiplications  and  additions  (MADs).  The 
FFT  uses  an  ingenious  computation  method  to  evaluate 
(A-2)  in  just  PloggP  MADs.  Thus,  for  P=4096 , the  number 
of  MADs  is  reduced  by  a factor  of  more  than  340.  Thus, 
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computations  requiring  more  than  5 hours  can  bo  done 
in  less  than  1 minute  using  FFTs  in  lieu  of  DFTs. 

Specific  details  of  the  FFT  are  beyond  the  scope  of 
this  dissertation. 

The  DFT,  (A-2),  is  a special  case  of  the  CZT , which 
was  introduced  by  Rabiner,  Schafer,  and  Rader  (1969) 
and  amplified,  including  software  implementation,  by 
Schilling  (1972)^  and  hardware  development  by  Alsup, 

Meansj  and  Whitehouse  (1973),  and  Buss,  Collins,  Bailey 
and  Reeves  (1973).  Given  sufficient  data,  it  is  a fast 
and  efficient  technique  for  computing  the  Z-transform  of 
a sequence  on  any  Z-plane  spiral.  The  modified  CZT 
(MCZT)  evaluates  equispaced  frequency  points  on  the 
unit  circle  in  the  Z-plane.  With  proper  spacing  and 
starting  points,  it  is  equivalent  to  the  DFT. 
Computationally,  the  MCZT  requires  three  FFTs  each  of 
size  greater  than  N (for  example,  2N)to  compute  the 
DFT,  (A-2).  However,  the  tradeoffs  are  really  more 
complex  than  this.  (For  example,  if  many  MCZTs  are 
to  be  performed  one  of  the  three  required  FFTs  does  not 
need  to  be  repeated  after  its  first  computation  since 
it  is  a transformed  cosine  data  table. ) The  major 
advantage  of  the  MCZT  occurs  when  the  number  of  data 
points  P (in  each  of  the  N data  segments)  is  large. 

^This  work  was  brought  to  the  author's  attention 
by  Dr.  N.  Ahmed,  Kansas  State  University,  Manhattan. 

Kansas . 
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In  such  cases,  the  original  P point  data  segment  can  be 
again  segmented  into  R partitions  each  disjoint  with 
size  P/R  data  points.  The  R partitions  are  processed 
with  R MCZTs;  the  outputs  are  summed  together  with 
appropriate  phasing  to  achieve  a PAM-CZT  that  is 
equivalent  to  the  DFT,  (A-2j.  The  mathematica]  details 
of  this  technique  are  covered  in  length  by  Ferrie, 

Nawrocki , and  Carter  (1975);  their  inclusion  here  does 
not  appreciably  add  to  the  discussion  but  do“s  considerably 
complicate  the  notation  due  to  conflicts  with  assigned 
symbols.  For  most  broad  band  cases  of  interest  (and 
certainly  the  example  case  in  Appendix  0),  the  FFT  will  be 
preferable  to  the  PAM-CZT.  A complete  discussion  of  the 
tradeoffs  is  given  by  Carter  and  Knapp  (1975). 

Having  computed  the  DFT,  (A-2) , either  by  an 
FFT  or  PAM-CZT,  we  are  ready  to  proceed  with  the  fourth 
step  in  the  spectral  estimation  algorithm. 

4.  The  spectral  estimates  are 


G (k)  = 

XX 


N 

Cg  E |x„(k)|^ 
n=l 


G (k)  = 

yy 


N 2 

=g  E IV"’!  • 

n=l 


N 


G (k) 
xy' 


c y X (k)Y’(k) 
g Z-i  n n^  ' 


n=l 


(A-3a) 


(A-3b) 


(A-3c) 


where  the  constant 
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N-f  -P 
s 


(A-3d) 


and  f = sampling  frequency.  (The  estimated  cross  spectrum 
s 


(A-3c)  is  complex.)  The  estimate  of  MSC 

- 


(A-4) 


G (k)G  (k) 

XX  yy'^  ’ 


The  AML  estimation  of  time  delay  requires  substituting 

y\ 

the  estimates  in  place  of  the  true  (but  unknown) 

value  of  MSC.  Therefore,  we  are  concerned  about  the 

statistical  variability  of  the  MSC.  Further,  the 

/\ 

statistical  characteristics  of  C are  of  interest  in  their 
own  right,  since  C is  useful  not  only  in  time  estimation 
(Chapter  3)  but  also  for  other  applications  (Chapter  2). 
Appendix  B discusses  the  statistics  of  the  MSC  estimate. 


APPENDIX  B 


1 

} 

1 


STATISTICS  OF  THE  MSC  ESTIMATE 
The  MSC  estimate,  from  (A-3)  through  (A-4),  is 


C (k)  = 
xy  ’ 


N ♦ I 9 


(B-1) 


N 

1 

x„(k) 

n=l 

3 

II 

where  N is  the  number  of  data  segments  employed  and 
X^(k),  Y^(k)  are  the  DFTs  of  the  n-th  weighted  data 
segments  of  x(t),  y(t),  respectively.  Under  certain 
assumptions  the  statistical  characteristics  of  C can 
be  evaluated.  This  appendix  is  divided  into  four 
sections.  The  first  section  gives  the  pdf,  cumulative 


distribution  function  (<>df),  and  m-th  moment  of  C,  given 
C and  N.  The  second  section  gives  the  bias  of  the 
estimate  C including  a discussion  of  when  the  analytic 
results  fail  and  simulations  to  support  the  theory. 

A 

The  third  section  gives  the  variance  of  C.  The  fourth 
section  gives  a computer  program  for  evaluating  receiver 
operating  characteristics  (ROC)  of  a linearly 
thresholded  coherence  estimation  processor.  The 
results  in  all  four  sections  are  based  on  the  derivation 
by  Goodman  (1957)  of  an  analytical  expression  for  the 
pdf  of  the  MC  estimate  and  the  subsequent  extensions  to 
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MSC  by  Carter,  Knapp,  and  Nuttall  (1973a).  These 
results  are  based  on  two  zero-mean  stochastic  processes 
that  were  jointly  stationary,  Gaussian , and  had  been 
segmented  into  N independent  segments.^  Each  segment 
was  assumed  large  enough  to  ensure  adequate  spectral 
resolution . Further,  each  segment  was  assumed  perfectly 
weighted  (windowed),  in  the  sense  that  the  Fourier 
coefficient  at  some  k-th  frequency  was  to  have  "leaked" 
no  power  from  other  binsv  The  statistics  do  not  hold 
at  the  zero-th  or  folding  frequencies  (Hannan  (1970)). 
Extensions  to  Goodman's  work  are  given  by  Alexander  and 
Vok  (1963),  Amos  and  Koopmans  (1963),  Enochson  and 
Goodman  (1965),  Nettheim  (1966),  Wahba  (1966),  Tick 
(1967),  Carter  and  Nuttall  (1972),  Carter,  Knapp  and 
Nuttall  (1973b),  Halvorsen  and  Bendat  (1975) ^and  Nuttall 
and  Carter  (1976a). 

Bl.  Probability  Density,  Cumulative  Distribution 
and  m-th  Moment  of  C 

The  first-order  pdf,  cdf  and  m-th  moment  of  the 
estimate  of  MSC,  given  the  true  v^alue  of  MSC  and  the 
number,  N,  of  independent  segments  processed,  are  presented 
in  this  section  in  closed  form. 


^Despite  the  fact  that  it  is  only  mathematically 
tractable  to  obtain  a.nalytic  expressions  when  the  segments 
are  independent,  we  would  in  practice  use  sc,  overlapped 
processing  to  regain  the  apparent  data  wastage  incurred 
by  the  necessity  of  data  weighting.  Carter,  Knapp,  and 
Nuttall  (1973a)  report  the  results  of  an  empirical  study 
that  demonstrates  how  bias  and  variance  decrease  as  a 
function  of  increased  data  segment  overlap.  Fifty  percent 
overlap  is  recommended  with  cosine  weighting. 
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The  conditional  pdf  for  C,  between  two  processes, 
given  C and  N,  is  (Carter,  Knapp,  and  Nuttall  (1973a)) 

p(C|N,C)  = (N-l)(l-C)^(l-C)^"^  (1-CC)^"^^2F^(1-N,1-N;1;CC). 

(B-2) 

The  2^2  ^ hypergeometric  function  with  two  numerator 

terms  and  one  denominator  term.  (It  is  a soecial  case  of 
(B-7)  and  is  discussed  more  fully  in  Sectio.>  B4 . ) For 
present,  we  note  equation  (B-2)  is  desirable  because 
nF. (1-N,1-N;1;CC)  can  be  expressed  as  an  (N-l)st  order 

A jL 

polynomial  (Abramowitz  and  Stegun  (1964),  Equation  (15.5.1)) 
A special  case  of  the  density  function  occurs  when 
C=0.  In  that  event, 

^ N_9 

(B-3) 

Using  a result  of  Fisher  (1950),  Carter,  Knapp,  and 
Nuttall  (1973a)  have  determined  (in  closed  form)  the 
cumulative  distribution  of  the  estimate  of  MSC,  namely, 


p(C|N,C=0)  = (N-D(l-C)^"^ 


(B-4) 


A digital  computer  program  to  evaluate  equation  (B-4) 
is  given  in  Section  B4 . In  the  special  case  when 
C=0,  the  cdf  can  be  simplified  to  give 

P(C|N,C=0)  = l-(l-C)^”^  . (B-5) 

Equation  (B-5),  when  differentiated,  yields  the  pdf 
equation  (B-2). 

The  m-th  moment  of  the  MSC  estimate  can  be  found  by 
application  of  Equation  7.512(12)  by  Gradshteyn  (1965)  to 
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a different  form  of  (B-1)  to  yield  (Carter,  Knapp,  and 
Nuttall  (1973a)) 


E[(C'"|N,C)]  = 


(1-C) 


N 


r(N)  r(m+i) 
r(N+m) 


3F2(m+l,  N,  N;  m+N,  1;C) 


(B-6) 


These  results  can  be  confirmed  using  Carter  (1972a)  and 
Anderson  (1958). 

The  3F2  hypergeometric  functions  (with  three 
numerator  terms  and  two  denominator  terms)  are  given  by 


2F2 ( a , b , c , 


d,e; 


z) 


00 

r* 

L 

k=0 


(B-7a) 


where  the  (a)j^  notation  is  Pochhammer's  symbol  (Abramowitz 
and  Stegun  (1964))  defined  by 


(a) 


k 


L r(a-Hk) 
r(a) 


(B-7b) 


where  T(  ) is  the  Gamma  function.  Similarly,  the  F two- 
one  function  has  two  numerator  and  one  denominator  terms. 

B2.  Bias  of  C 

This  section  deals  with  the  bias  of  the  MSC  estimate. 
Exact  and  approximate  expressions  are  presented.  In 
addition,  computer  evaluation  of  the  exact  expressions 
is  presented  to  lend  meaning  to  these  results,  and  two 
computer  simulations  are  presented.  The  first  simulation 
demonstrates  the  need  to  have  adequate  spectral  resolution. 
The  second  simulation  verifies  the  theoretical  results 
for  bias  (and  also  variance,  which  is  discussed  in  the 


next  section,  B3). 
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Consider  now  the  first  moment  of  the  estimate  of 

MSC  which  can  be  written  as 
N 

ElC|N,C]  = 3F2(2.N.N;N+1.1;C)  , (B-8) 

which  can  be  manipulated  into  the  form  (Carter  (1972a)) 

E(C|N,C)  = I + C 2F^(1.1;N+2;C)  . (B-9) 

The  bias  or  expected  estimation  error  is  defined 
as 

Bias  = B(C|N,C)  = E(C|N,C)  - C . (B-10) 

An  exact  expression  for  the  bias  is 

B(C|N,C)  = ^ + fjx  C 2Fj^(1.1;N+2;C)-C  . (B-11) 

The  maximum  bias  is  1/N  (regardless  of  N and  C),  The 

bias  is  plotted  in  Figure  B-1.  It  should  be  noted  that 

lim  (Bias)  « 0 ; (B-12) 

N-+® 

therefore,  the  estimator  may  be  referred  to  as  asymptotically 

unbiased.  By  expanding  in  (B-11)  in  a power  series 

-2 

in  C and  retaining  terms  to  order  N , the  following 
approximation  is  obtained  (Nuttall  and  Carter  (1976b)): 

B^(C,N)  = ^(1  - C)^(l+  . (B-13) 

Plots  of  N B(C,N)  and  N B^(C,N)  are  presented  in  Figure 
B-2  for  N=4  (they  cross  near  C=0.4).  Approximation 
(B-13)  is  seen  to  be  excellent  over  the  entire  range  of 
C.  Furthermore,  the  discrepancy  between  the  approximation 
(B-13)  and  the  true  bias  (B-11)  is  even  less  for  larger 
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values  of  N. 

For  large  N,  (B-13)  is  further  reduced  to  the 
approximation  given  by  Carter,  Knapp,  and  Nuttall  (1973a): 

B2(C,N)  = ^(1-C)^;  good  for  large  N . (B-14) 

Therefore,  as  N leads  to  infinity,  N B(C,N)  tends  to 
2 

(1-C)  , which  is  also  plotted  in  Figure  B-2;  furthermore, 
the  approach  is  monotonic. 

In  Benignus  (1969a),  (2),  an  approximate  expression 
for  the  bias,  based  upon  a simulation  approach,  is 
presented  as 

63(0, N)  = ^(1-C)  . (B-15) 

Whereas  the  results  in  Haubrich  (1965)  and  (B-14)  dictate 
a quadratic  behavior  for  bias,  the  approximation  by 
(B-15)  indicates  a linear  behavior.  Since  (B-11)  through 
(B-14)  is  based  upon  theory  and  (B-15)  is  based  upon 
simulation,  it  was  decided  to  verify  (or  invalidate) 

(B-11)  through  (B-14)  by  a simulation  approach.  Two 
computer  simulations  were  conducted. 

In  order  to  verify  the  theory,  the  simulation  must 
preserve  those  assumptions  present  in  the  derivation 
of  the  theoretical  expression  (B-11)  for  bias.  Specifically, 
as  pointed  out  by  Carter  and  Knapp  (1975),  (B-11)  holds 
under  the  following  assumptions: 

1.  jointly  Gaussian  stationary  processes 

2.  N independent  (non  overlapped)  data  segments 


3.  smooth  weighting  function  to  reduce  side 
lobe  leakage 

4.  adequate  frequency  resolution 

When  any  of  the  specified  assumptions  are  violated, 
analytic  results  derived  for  bias  (and  variance)  of 
the  estimator  can  be  grossly  misleading.  (The  Gaussian 
part  of  the  first  assumption  is  weak;  see  the  discussion 
after  (3-3).)  As  an  empirical  verification  of  this 
statement,  consider  the  study  reported  by  Carter  and 
Knapp  (1975),  where  C^y(f)  = l,Vf.  Specifically,  consider 
a simple  linear  second-order  digital  filter  of  the  form 

Y = 1.97300Y  - - 0.9S202Y  „ + 0.00872X  • (B-16) 

n n-i  n-2  n 

The  system  behavior  was  studied  by  probing  the  filter 

with  a white  pseudorandom  noise  source.  The  sampling 

rate  was  set  equal  to  2048  Hz;  hence,  the  Nyquist  rate 

of  7T  radians  is  depicted  as  1024  Hz  in  the  figures  that 

follow. 

The  filter  phase  characteristics  were  estimated. 
Figure  B-3,  with  P=1024,  cosine  weighting,  and  64 
independent  segments.  Despite  the  fact  that  the  MSC 
between  input  and  output  should  equal  unity  (hence,  the 
bias  of  the  estimator  would  normally  be  zero),  the 
estimate  of  MSC  is  grossly  biased  when  a rectangular 
weighting  function  is  used.  Specific  MSC  estimates  are 
depicted  in  Figure  B-4  for  the  rectangular  weighting 
case.  The  bias  attributable  to  improper  windowing,  while 


Estimates  of  C (True  Value  of  C is  Unity)  Due 
Lng  Function 


severe,  can  be  substantially  eliminated  through 
selection  of  a leakage-suppressing  window.  When  a 
cosine  or  Hanning  window  is  utilized  and  the  data  are 
reprocessed,  estimates  depicted  in  Figure  B-5  are 
obtained.  Notice  now  that  the  bias,  though  greatly 
improved,  still  exists  in  the  vicinity  of  30  Hz. 

Referring  to  Figure  B-3,  notice  that  30  Hz  is  the  center 
of  a frequency  band  in  which  the  first  derivative  of  the 
phase  is  large.  The  dependence  of  the  bias  of  the  MSC 
estimate  on  this  characteristic  of  phase  is  predicted  in 
Jenkins  and  Watts  (1968),  Hannan  (1970),  and  Koopmans 
(1974) . 

Once  sufficient  resolution  has  been  achieved,  this 
bias  no  longer  exists.  To  determine  whether  the  bias 
in  Figure  B-5  could  be  reduced  by  more  averaging,  as 
analytically  predicted  by  the  approximation  in  Jenkins 
and  Watts  (1968),  additional  independent  data  segments 
were  processed  in  the  simulation  (that  is,  N was  made 
larger  without  changing  P).  In  this  case  of  insufficient 
resolution,  the  maximum  bias  error  was  observed  to  be 
independent  of  the  number  of  segments  averaged;  that  is, 
the  estimator  is  biased  as  N-*-®  when  the  number  of  data 
points  per  segment  is  small. 

When  large  amounts  of  data  are  used,  as  in  the  case 
of  a computer  simulation,  better  resolution  can  be 
obtained  without  loss  of  averaging  (variance  reduction) 
capability.  However,  when  the  data  are  of  limited 
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Figure  B-5  Example  of  Biased  Estimates  of  C Due  to  Inadequate  Frequency 
Resolution 


duration  thon--depondenL  on  the  lentJth  of  actual  data 
cuts  and  the  stationarity  of  events  over  that  duration — 
another  method  can  be  employed  to  improve  MSC  estimation 
in  the  face  of  rapidly  changing  phase  angles.  These 
rru'Lhods  ar<'  ref<7rred  to  a.s  alignment,  or  translation 
methods,  and  are  used  to  remove  the  time  delay  or  group 
delay  of  a filter.  Translation  (that  is,  prefiltering 
by  a single  time  delay)  of  one  time  series  with  respect 
to  another  permits  the  rate  of  change  of  the  phase  in  a 
particular  frequency  band  to  be  controlled  and  reduced 
to  yield  better  MSC  estimates  in  that  frequency  band. 

The  implication  is  that  MSC  estimates  are  valid  in 
frequency  bands  where  the  phase  has  little  or  no  slope. 
Various  methods  for  estimating  the  time  delays  are 
discussed  in  Chapter  4. 

Translation  was  applied  to  align  the  time  series 
for  the  example  presented  here.  After  alignment, 
unbiased  estimates  were  obtained  in  a 20  Hz  band  about 
30  Hz;  however,  as  expected,  outside  that  band,  biases 
were  severe,  making  interpretation  meaningless.  In 
general,  translation  must  be  applied  for  "all"  time 
delays  and  the  results  combined  into  one  result  (graph); 
hence,  when  sufficient  data  are  available,  the  author's 
preference  is  for  finer  frequency  resolution  rather  than 
for  the  piecemeal  approach  which  may  be  dictated  for 
reasons  of  limited  data  or  limited  stationarity.  In  the 
latter  case  (of  F sufficiently  large),  C will  not  depend  on  D. 
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The  example  used  here  exhibited  biases  of  one 
tenth  (see  Figure  B-5);  furthermore  the  trend  was 
clearly  indicative  of  the  fact  that  any  bias  (less 
than  one)  could  be  expected  with  insufficient  frequency 
resolution  even  when  as  many  as  64  independent  data 
segments  have  been  processed  (Carter  (1972b)).  The 
practical  implication  of  this  limitation  is  that  it  is 
highly  desirable  that  the  actual  number  of  data  points 
per  segment,  P,  be  large.  For  a finite  duration  data 
set,  this  will  mean  increased  instability  in  the 
estimator  (that  is,  smaller  N and  hence  larger  variance). 
It  should  be  noted  that  one  cannot  simply  increase  P 
by  adding  zeros  or  by  increasing  the  sampling  rate 
of  the  original  data,  for  then  no  additional  information 
content  is  added.  Quite  the  contrary,  the  minimum  data 
sampling  rate  should  be  selected,  for  this  ensures 
the  maximum  amount  of  actual  time  per  data  segment  for 
a given  value  of  P.  Good  resolution,  that  is,  large 
P,  apparently  requires  computation  of  a large  size  FFT. 

An  alternative  computation  that  reduces  the  required 
FFT  size  is  the  PAM-CZT  (Appendix  A). 

The  results  of  the  first  simulation  show  two  critical 
things:  first,  when  estimating  MSC  (or  any  spectral 

quantities)  it  is  important  to  use  both  smooth  weighting 
functions  and  adequate  frequency  resolution.  Second, 
simulation  experiments  to  validate  expressions  for  bias 
of  C can  give  misleading  results  due  to  the  sensitivity 


of  the  four  fundamental  assumptions  upon  which  the  theory 
rests.  Another  difficulty  in  experimentally  estimating 
bias  is  that  when  the  assumptions  do  hold,  the  bias  is 
a small  quantity  to  measure.  For  example,  for  - 
C=0.3,  N=32,  we  find  B(C,N)=0. 0156 . However,  the 

/N 

standard  deviation  of  C is  approximately  0.3.  (See 
Section  3 of  Appendix  B.)  Thus  a large  nvimber  of 
independent  trials,  in  each  of  which  C is  computed, 
must  be  used  in  order  to  obtain  a sample  mean  that  has 
statistical  significance.  We  use  10,000  different 
independent  trials  at  each  value  of  C*0  (.1).9;  the 
results  of  Benlgnus  (1969a)  employed  less  than  1,000 
trials. 

Lastly,  the  smallness  of  the  bias  dictates  that  the 
desired  value  of  C be  accurately  realized  in  the  simulation. 
As  an  exaiiiple  of  the  danger  of  not  doing  so,  consider 
the  following:  suppose  we  believe  we  have  generated 

processes  with  a desired  coherence  of  0.300,  aid 
subsequently  observe  a sample  mean  of  0.315;  in  such  a 
situation,  the  estimated  bias  is  0.015.  But  if  the 
generated  coherence  is  not  precisely  under  the 
experimenter's  control  and  is  off  by  only  1 percent 
(giving  rise  to  a true  coherence  in  this  example  of 
0.303),  then  the  bias  should  have  been  reported  as 
0.315-0.303*0.012.  Thus,  a 1 percent  error  in  true 
coherence  gives  rise  to  a 25  percent  error  in  estimated 
bias  in  this  example.  We  generate  our  correlated 
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processes  according  to 


where  a(t) 
processes , 


x(t)  ■=  a(t)  , 
y(t)  = b(t)  + g a(  t) , 
and  b(t)  are  uncorrelated  comp 


(B-17a) 
(B-17b) 
lex  Gaussian 


and 


fC  (B-18) 

e = \-iZc  ' 

The  statistical  characteristics  of  C in  (B-1)  are 
derived  on  the  fact  that  X(k)  and  Y(k)  are  Gaussian. 

This  will  be  the  case  if  x(t)  and  y(t)  are  Gaussian; 
however,  the  essence  of  the  theory  does  n^  require 
X(k)  and  Y(k)  to  be  DFT  outputs  but  merely  complex 
Gaussian  random  variables.  Thus,  we  can  simply  avoid 
the  issues  of  weighting  and  frequency  resolution  by 
simulating  the  DFT  outputs  directly:  this  technique 
reduces  the  cost  of  the  experiment  (and  indeed  will 
verify  the  theory).  The  essential  features  of  the 

simulation  are  given  in  Figure  B-6. 

The  results  of  the  simulation  for  N=4  are  superposed 


In  Figure  B-7  on  the  exact  bias  curve. 

In  particular,  the  sample  mean  of  10,000  independent 
trials  at  each  value  of  C»0(.1..9  is  plotted,  along  with 
1 vertical  bar  between  the  +0  points  of  the  random 
variable.  In  seven  out  of  the  ten  cases  of  selected 
use,  the  « points  bracket  the  theoretical  curve,  and 
the  remaining  three  out  of  ten  are  included  within  the 


■ On  r»r\i 


The  nnssibility  of  (B-15)  falling  wlthii 
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Figure  B-6  Flow  Diagrajp  for  Empirical  Determination 
of  Bias  of  C;  N»4 ; 10,000  Trials 
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these  tolerances  is  completely  ruled  out.  Thus,  the 
simulation  confirms  the  theoretical  result  in  (B-11) 
and  rules  out  the  approximation  in  (B-15). 

Since  we  have  a simulation  technique  which  corroborates 
the  theory  so  well,  it  is  possible  to  employ  it  to 
investigate  other  more  complicated  functions  of  C 
which  are  very  difficult  (if  not  impossible)  analytically. 
In  particular,  we  use  a bootstrap  idea  based  upon  that 
of  Benignus  (1969a)  in  an  attempt  to  reduce  the  bias 
of  the  coherence  estimate.  Namely,  we  consider  a 
modified  estimate  of  MSC  as 


A 

A 


c 


* max 


(B-19) 


where  we  have  estimated  the  bias  by  means  of  (B-13)  and 
the  initial  estimate  C of  MSC.  The  reason  for  the 
0 in  (B-19)  is  that  we  are  unwilling  to  accept  negative 
estimates  of  coherence.  (Without  the  0 in  (B-19)  we 
can  reduce  the  bias  further  at  the  expense  of  added 

A 

variance.)  The  estimated  bias  and  variance  of  C and 

A 

C are  presented  in  Table  B-1.  It  is  observed  that  the 

A 

bias  of  C is  significantly  reduced.  However,  the 
variance  is  increased.  In  fact,  the  estimated  mean 
square  error  (MSE)  (which  equals  the  variance  plus  the 
square  of  the  bias)  is  presented  in  Table  B-1  and  is 

A A 

greater  for  C than  for  C when  C is  greater  than  0.3; 
the  opposite  behavior  holds  when  C is  less  than  0.3. 
(For  N=4,  C=0.3  is  the  crossover.)  Thus,  the  choice 


.005  -.0006  .006  .008  .006  .008 


between  the  two  estimators,  C and  C,  depends  on  whether 


one  is  bothered  more  by  bias  or  MSE. 


For  larger  N,  the  crossover  value  of  C,  at  which  C 

A 

A 

or  C has  less  MSE,  decreases.  For  example,  at  N=8, 


it  was  observed  to  occur  at  C=0.2.  Thus,  for  practical 


useful  values  of  N (which  are  usually  much  larger  than 

A 

1),  the  estimator  C will  have  less  MSE  than  C over  almost 


the  whole  range  of  C and  will  probably  be  preferred. 


Also,  the  bias  is  quite  small  for  large  N.  The  variance 

A 

of  C is  discussed  in  the  next  section.  Under  the 


assumptions  of  smooth  weighting  functions  and  adequate 


frequency  resolution,  we  will  see  variance  is  a more 


significant  problem  than  bias.  However,  as  seen  in  this 


section,  when  the  assumptions  are  violated,  the  bias 


can  be  a significant  source  of  estimation  error. 


B3.  Variance  of  C 


An  exact  expression  for  the  variance  of  C is 
Carter  (1972a): 

2 (l-c)^ 


V = 


N(N 


3F2^3,  N,  N;N  + 2,  1;  c) 


. (B-20) 

(B-20)  is  plotted  in  Figui e B-8.  For  the  special  case 


of  C=0, 


V = 


N(N  + 1) 


- (r- 


N-1 

T 

N (N  + 1) 


C=0 


(B-21a) 


, for  large  N and  C=0. 


(B-21b) 


ce 


Fcr  large  N and  CfO, 


which  has  a maximum  value  of  8/27N=0.30/N  at  C=l/3. 

Thus  the  maximum  variance  is  always  less  than  0.30/N 
regardless  of  the  value  of  C.  Hence,  the  variance  of 
the  estimator  in  the  case  where  C is  unknown  (but 
nonzero)  decreases  inversely  proportional  to  N.  We 
note,  by  inspecting  (B-20),  that  for  larger  and  larger 
N,  (B-22)  becomes  a better  and  better  approximation. 

Since,  in  general,  we  do  not  know  the  true  value  of 
MSC,  we  select  N based  on  a worst  case  (maximum  variance) 
analysis. 

Provided  we  have  used  good  weighting  functions  and 
good  frequency  resolution,  the  variance  has  a more 
serious  effect  than  bias.  For  example,  if  C*l/3  and 
N=100,  then  the  bias  of  C is  less  than  0.01,  while  the 
standard  deviation  (square  root  of  the  variance)  is 
approximately  equal  to  .05.  Hence,  even  when  100 
independent  segments  are  processed,  the  MSC  estimate 
still  has  significant  variability. 

B4.  Receiver  Operating  Characteristics  for  a Linearly 
Thresholded  Coherence  Estimation  Detector 

An  algorithm  for  computing  the  ROC,  or  the 
probability  of  detection,  P^,  versus  the  probability 
of  false  alarm,  Pp,  for  a linearly  thresholded  MSC 
estimation  detector  is  presented  together  with  an 
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example  of  a ROC  table^  (Carter  (1976).  A recent 
article  (Gevins,  Yeager,  Diamond,  Spire,  Zeitlin,  and 
Gevins  (1975))  presents  new  results  on  using  linearly 
thresholded  MSC  estimates  to  detect  biomedical 
phenomena.  The  desire  to  establish  a threshold  below 
which  MSC  estimates  are  not  presented  to  a human 
decision  maker  is  an  important  issue  in  certain  areas, 
such  as  brain  wave  analysis  and  sonar,  where  the  volume 
of  sensor  da'^a  is  large.  Fcr  a fixed  amount  of 
averaging  and  a fixed  threshold  value,  E,  in  the 
absence  of  a coherent  source,  there  is  still  a certain 
probability,  Pp,  that  an  MSC  estimate  will  exceed  the 
threshold.  Moreover,  although  the  false  alarm  rate 
can  be  reduced  by  increasing  E,  to  do  so  decreases  Pj^, 
when  a coherent  source  is  present.  How  much  it 
decreases  P^  will  depend  on  the  strength  of  the  coherent 
source,  that  is,  the  true  or  underlying  coherence  that 
is  being  estimated.  This  section  presents  an  algorithm 
for  computing  P^  versus  Pp  for  a specified  amount  of 

A 

averaging  and  underlying  coherence.  The  pdf  of  C, 
when  C»0,  is  (from  (B-3)): 

p(C|N,  C=0)  = (N-l)(l-C)^^"^^  . (B-23) 

^The  idea  for  computing  RCXJ  curves  was  suggested 
to  the  author  by  R.  Trueblood,  Naval  Undersea  Center, 

San  Diego,  California. 
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Hence,  the  piobability  of  false  alarm  is 

E 

P„  - 1 - / (N-l)(l-C)^^"^^dC  (B-24) 

o 
or 

E = 1 - exp[log(Pp)/(N-l)] ; (B-25) 

that  is,  for  a specified  Pp  we  establish  a threshold 
according  to  (B-25).  Now  the  computationally  more 
complex  question  is:  What  probability  of  detection  is 

achieved  for  this  threshold  value  E?  The  answer,  for 
a given  value  of  C,  is 


Pjj  * / p(C|N,C)  dC-1  - P(C<E|N,C)  . (B-26) 

E 

A 

where  P(C£E|N,C)  is  the  cdf.  The  cdf  is  given  by  (B-4), 
namely. 


- , ^-2  Ti-eI*^ 

P(C<E|N,C)  -RE  2^1^"^'  1-N;1;Z)  , (B-27a) 

^ - 


where 


Z = EC 


R-E 


L1=C) 

(T^ 


N 


(B-27b) 

(B-27c) 


2^^  is  the  hypergeometric  function, 


The  hypergeometric  function  is,  in  general,  an  infinite 
series:  however,  for  negative  integers,  it  is  given  by 
equation  (15.4.1)  of  Abramowitz  and  Stegun  (1964)  as 

I 

F (-t,l-N;l;Z)  - E T.  , (B-28a) 

^ ^ k-0 


where 
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k ■ TT^ 


(B-28b) 


Pochhammer ' s Symbol  (z)jj  (P-  256  of  Abramowitz 
and  Stegun  (1964)) 


(z) 


r(z+k) 
k r(z)  ’ 


(B-28c) 


and  where  the  Gamma  function  is  given  by  Hankel's 
Contour  integral  (p.  255  of  Abramowitz  and  Stegun  (1964)) 
as 


r(z) 


(B-28d) 


The  path  of  integration  starts  at  +<»  on  the  real  axis, 
circles  the  origin  in  the  counterclockwise  direction, 
and  returns  to  the  starting  point.  However,  (B-27) 
can  be  computed  without  resort  to  complex  integration 
methods  (even  when  the  real  part  of  z*0)  by  noting  for 
k an  integer  that  Pochhammer 's  Symbol, 


(z)j^  » |z(z+l)(z+2) . . . (z+k~l)  , k>0 

(l  k=0  , (B-29) 

is  the  product  of  k incrementally  increasing  terms. 

Now  in  (B-28b)  when  Z^ZCfO,  the  first  term  Tq=1  and 
the  ratio  of  the  k-th  to  the  (k-l)-st  term  is 


(k-l~t)(k-H-l-N)z 

2 

k 


(B-30) 


Now  each  term  in  the  sura  can  be  computed  from  the 
previous  term  in  a simple  fashion.  Indeed,  the  actual 


computations  can  be  implemented  in  BASIC  on  the 
Hewlett-Packard  9830A  desk  top  calculator  in  less  than 
30  lines  of  code,  Figure  B-9.  For  models  of  the  form 
x(t)  = s(t)+n^(t) 

y(t)  = s(t+D)+n2<t)  , (B-311 

where  s(t),  n^(t),and  n2(t)  are  mutually  uncorrelated, 
and  When  f f) . the  SNR  is 


Gnn<f> 

nn 


More  generally,  if 


x(t)  = z^(t)+nj^(t)  (B-33a 

y(t)  = Z2(t)+n2(t)  , (B-331 

where  z^(t)  is  the  output  of  a linear  filter  Hj^(f) 
excited  by  s(t),  i=l,  2 and  the  noises  are  mutually 
uncorrelated  and  uncorrelated  with  the  signal,  then  it 
can  be  shown  that  (2-86) 

C (f)=c  (f)c^,(f)  ; 

xy  sx  sy 

that  is,  the  coherence  between  two  receivers  is  the 
product  of  the  coherence  between  the  source  and  each 
of  the  individual  receivers  for  the  model  (B-33). 
Substituting  for  the  model  in  (B-33)  results  in 


G_  (f)  G (f) 

^11  2 2 
_ / X*  \ 


C (f) 
xy 


I 
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ICl  N-8 
20  N1=N-1 

30  N2=N-2 

40  A»l-N 
50  C-0.25 

60  PRINT  "THIS  RUN  IS  FOR  N="N"  AND  MSC="C 
70  FOR  Fl-0.04  TO  1 STEP  0.04 
80  E-1-EXP(L0G(F1)/N1) 

90  Z-E*C 

100  C4-(l-E)/(l-Z) 

110  C2-E*((l-C)/(1-Z))+N 

120  S«0 

130  FOR  L-0  TO  N2 
140  C3-C4fL 
150  T=1 

160  F»1 

170  IF  (L“0)  THEN  230 
180  FOR  K-1  TO  L 
190  K1*K-1 

200  T-T*(A+K1)*(K1-L)*Z/(K*K) 

210  F=F+T 

220  NEXT  K 

230  S*S+C3*7 

240  NEXT  L 

250  P»C2*S 

260  FIXED  3 

270  PRINT  E;F1;P,1-P 

280  NEXT  FI 

290  END 


Figure  B-9  Computer  Program  to  Compute  ROC  Tables 


184 


that  SNR  is 


z z 


n n 

Ditli  DjOg 


1/2 


(B-36) 


Hence,  for  models  of  the  form  of  (B-31)  or  (B-33)  if 
we  want  to  look  at  the  0 dB  (or  equal  SNR  case),  we 
must  select 


10  logjg 


• ■ 


(B-37) 


which  implies  C=0.25.  Now  suppose  we  average  for 
only  N=8  Independent  data  segments.  Then  for 
Pp»0. 04(0. 04)1. 00,  the  thresholds,  Pp,  cdf  and  P^ 
are  given  in  Table  B-2.  If  a sufficient  amount  of 
stationary  data  exists,  effective  performance  can  be 
Improved  by  increasing  N;  if  not,  N can  only  be 
Increased  at  the  expense  of  degrading  the  frequency 
resolution  with  its  inherent  difficulties.  For  many 
problems,  N*8  will  be  too  small  and  Pp*0.04  will  be 
too  large  or  the  performance  will  be  desired  for  a 
different  value  (or  family  of  values)  of  C.  Example 
plots  are  given  in  Figures  B-10  and  B-11;  more 
extensive  results  can  be  obtained  by  modifying  the 
program,  Figure  B-9. 
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Table  B-2.  Threshold,  P„,  cdf,  and  P_  for  N=8  and  C=0.  25 

r D 


THIS  RUN  IS  FOR  N==8  AND  MSC=0.25 


0.369 

0.040 

0.606 

0.394 

0.303 

0 . 080 

0.473 

0.527 

0,261 

0.120 

0.389 

0.611 

0.230 

0.160 

0.327 

0.673 

0.205 

0.200 

0.279 

0.721 

0.184 

0.240 

0.240 

0.760 

0.166 

0.280 

0.208 

0.792 

0.150 

0.320 

0.181 

0.819 

0.133 

0.360 

0.157 

0.843 

0.123 

0.400 

0.137 

0.863 

0.111 

0.440 

0.119 

0.881 

0.100 

0.480 

0.104 

0.896 

0.089 

0.520 

0.090 

0.910 

0.079 

0.560 

0.078 

0.922 

0.070 

0.600 

0.066 

0.934 

0.062 

0.640 

0.057 

0.943 

0.054 

0.680 

0.048 

0.952 

0.046 

0.720 

0.039 

0.961 

0.038 

0.760 

0.032 

0.968 

0.031 

0 . 800 

0.025 

0.975 

0.025 

0.840 

0.019 

0.981 

0.018 

0.880 

0.014 

0.986 

0.012 

0.920 

0.009 

0.991 

0 . 006 

0.960 

0.004 

0.996 

0.000 

1.000 

0.000 

1.000 

APPENDIX  C 


COMPUTER  PROGRAM  FOR  SPECTRAL 
AND  TIME  DELAY  ESTIMATION 

This  appendix  is  divided  into  two  sections.  The 

first  section  is  a brief  program  description.  The  second 

section  is  a complete  listing  of  the  main  program  and 

subroutines  necessary  for  program  execution. 

Cl.  Program  Description 
The  main  program  estimates  the  auto  and  cross 
spectral  density  functions.  These  spectral  estimates  are 
used  by  the  subroutine  PRCES  to  estimate  six  different 
AML  estimates  for  time  delay  (See  Table  4-1  of  the  main 
text.)  Facilities  with  spectral  estimation  programs 
can  simply  augment  their  computations  with  a call  to 
PRCES.  Facilities  without  spectral  estimation  algorithms 
will  be  able  to  use  the  programs  listed  in  Section  2 of 
this  appendix.  The  programs  listed  are  intended  to  be 
general  FORTRAN  IV  programs;  they  have  been  compiled 
and  executed  on  the  Univac  1108,  the  Control  Data 
Corporation  (CDC)  6600  and  International  Business  Machine 
(IBM)  360.  The  spectral  estimation  programs  have  been 
used  for  research  projects  by:  Williams  (1971),  Carter 
(1972a)  and  (1972b),  Brady  (1973),  Carter,  Knapp,  and 
Nuttall  (1973a),  Carter,  Nuttall,  and  Cable  (1973), 
Santopietro  (1973),  Carter  and  Knapp  (1975),  and  Appendix  D 
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of  this  dissertation.  These  research  projects  were 
conducted  entirely  on  the  Unlvac  1108  and  a significant 
program  rewrite  was  undertaken  to  make  the  programs 
more  transferable  from  one  computer  system  to  another.^ 
The  programs  as  a complete  data  processing  system 
consist  of  input,  computations  and  display.  We  have 
concentrated  our  rewrite  efforts  on  the  computations; 
both  the  i.'iput  and  display  programs  are  expected  to 
contain  peculiarities  of  the  particular  computer  being 
used.  The  input  and  display  subroutines  are  modular 
so  that  only  a minimum  rewrite  is  required  to  transfer 
the  program  to  another  installation.  The  function  of 
the  input  subroutine  LOAD  is  to  load  the  XX  and  YY  arrays 
with  NNN  data  points.  If  the  data  were  stored  on  logical 
magnetic  tape  number  6 in  binary  format  the  call  to 
LOAD  could  be  replaced  by  the  FORTRAN  statement 
"head  6,  XX(I),  YY(I),  I=i,  NNN" 

The  subroutine  LOAD  listed  in  Section  2 is  used  to 
generate  synthetic  data  for  a suitable  test  case 
(though  not  the  example  for  Appendix  D).  The  display 
subroutine  DPLOT  is  called  either;  (1)  to  initialize  the 
plotter,  (2)  to  plot  the  specified  array,  or  (3)  to 
terminate  plotting.  The  subroutine  listed  in  Section  2 
is  written  for  the  Stromberg  Carlson  4060  plot  system. 

It  must  be  rewritten  for  other  systems.  If  a facility 


The  programs  originally  written  and  documented  by 
C.R. Arnold,  G.C. Carter,  and  J.F.Ferrie,  have  been  rewritten 
and  tested  oy  J.C.Sikorski , G.C. Carter  and  Dr.  R.G. Williams. 
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has  no  plotting  system,  the  subroutine  should  simply  be 
a subroutine  which  returns;  alternatively,  the  subroutine 
could  print  the  XX  array  for  I=ISTRT  to  ISTOP.  Thus, 
for  use  at  a new  site,  two  subroutines  (LOAD  and  DPLOT) 
need  to  be  rewritten. 

The  main  program  also  calls  (in  addition  to 
DPLOT  and  LOAD):  HICMP,  FFT,  LIST,  LIST2,  PRCES , and  LREMV. 

The  subroutine  LREMV  computes  (and  optionally  removes) 
the  linear  trend  and  dc  for  the  input  time  waveforms. 

These  computations  are  performed  for  every  time  segment 
and  are  printed  out  by  the  main  program  as  an  aid  to 
detecting  nonstationarities  or  digitizing  errors.  The 
subroutines  LIST  and  LIST2  are  used  to  print  out  (list) 
results.  The  subroutine  FFT  computes  the  FFT  (see,  for 
example,  Cooley-Tukey  ( 1965 ));  coded  and  listed  by 
Singleton  (1969).  Singleton's  mixed  radix  algorithm  has 
been  shown  by  Ferric  and  Nuttall  (1971)  to  be  significantly 
fasti'r  (though  less  accurate)  than  othi'r  proposed  FFTs . 
Singleton's  600  line  FORTRAN  subroutine  can  be  replaced 
with  shorter  programs  (see,  for  example,  p.  332  of 
Oppenheim  and  Schafer  (1975)).  Because  of  the  availability 
of  Singleton's  listing  in  the  literature,  the  FFT  is  not 
listed  here.  Note  that  the  subroutine  PRCES  and  the  main 
program  presume  that  the  FFT  output  array  is  subscribed 
from  1 to  NPFFT  and  not  from  0 to  (NPFFT-1).  The 
subroutine  PRCES  implement.s  the  six  AML  processors 
given  in  Table  4-1.  The  subrouLi.ie  PRCES  calls  on  the 
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subroutines  FFT  and  DPLOT  (already  discussed). 

Singleton's  subroutine  performs  a mixed  radix  FFT; 
that  is,  the  number  of  data  points  do  not  need  to  be 
integer  powers  of  2 such  as  512,  1024,  2048  and  4096 
but  can  have  factors  of  2's,  3's,  and  5's,  such  as 
1000,  1500,  2000  and  3000.  Numbers  which  can  be 
factored  into  2's,  3's,  and  5's  only  are  called  highly 
composite.  Given  the  FORTRAN  variable  NNN,  the  sub- 
routine HICMP  finds  the  highly  composite  number  closest 
to  (but  greater  than  or  equal  to)  NNN.  The  output  of 
HICMP  is  NEWNNN.  For  some  applications,  the  program 
user  will  want  NEWNNN  to  be  twice  as  large  as  NNN; 
this  is  because  the  main  program  fills  the  data  arrays 
with  zero  from  NNN  + 1 to  NEWNNN.  Such  zero  filling 
is  (theoretically)  required  to  inhibit  the  effect  of 
circular  convolution;  in  practice,  though,  (with 
stochastic  data)  zero  filling  does  not  warrant  the  added 
(doubled)  computational  cost.  If  it  is  desired,  zero 
filling  can  simply  be  achieved  by  adding  one  line  to 
HICMP:  "NEWNNN  = 2*NEWNNN" . 

In  addition  to  calling  several  critical  subrout  i n»,‘s , 
the  main  program  perforins  computations  necessary  to 
estimate  the  spectral  characteristics  of  the  two  waw'f>- 
forms  under  investigation.  The  computations  performed 
r.re  briefly  outlined  in  four  major  steps  in  appendix  A. 

When  the  two  input  waveforms  are  complex,  one  FFT  of 
each  waveform  segment  is  required  as  specified  in 
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Appendix  A.  However,  in  most  (though  not  all)  practical 
data  collection  facilities,  the  input  waveforms  are  real 
(not  complex).  When  x(t)  and  y(t)  are  real,  one  FFT  of 
the  complex  waveform  x(t)  + jy(t)  can  be  computed  and 
quickly  be  manipulated  to  form  the  FFT  of  x(t)  and  the 
FFT  of  y(t).  (See  p.  333-334  ol  Oppf^nhelm  and  Schafer 
(1975);  see  also  p.  271-293  of  Raoiner  and  Rader  (1972).) 
These  observations,  combined  with  (A-3)  give  rise  to 
the  FORTRAN  statements  used  to  estimate  the  spectral 
characteristics  of  x(t)  and  y(t).  The  application  of 
this  theory  reduces  the  computation  time  for  two  real 
waveforms  by  a factor  of  two.  The  tinal  comment 
necessary  before  presenting  the  computer  listings  is  to 
describe  the  input  FORTRAN  variables.  NNN  is  the  number 
of  data  points  per  segment.  ISR  is  the  integer  sampling 
rate  (Hz).  NDSJP  is  the  number  of  disjoint  segments 
in  the  total  time  waveform.  SFX  and  SFY  are  scale 
factors  used  to  adjust  the  (voltage)  level  of  the  input 
waveform  to  correct  for  frequency  independent  attenuations 
in  the  data  collection  and  digitizing  process.  (When  no 
correction  is  desired,  the  user  sets  SFX»SFY=1 . 0 . ) 
when  the  user  desires  the  spectral  estimates  to  appear 
3 dB  higher,  he  sets  SFX=SFY=2. 0. ) With  these  five 
sample  inputs,  the  input  time  data  are  processed.  The 
next  section  gives  a complete  program  and  subroutine 
listing. 
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APPENDIX  D 


EXAMPLE  COMPUTER  RUN  FOR  SPECTRAL  AND 
TIME  DELAY  ESTIMATION 

Theoretical  equations  have  been  derived  in 
Chapter  3 for  ML  estimation  of  time  delay.  A computer 
program  to  achieve  an  AML  estimate  of  delay  is  given 
in  Appendix  C.  The  purpose  of  this  chapter  is  to 
describe  four  example  cases  which  were  run  to  sub- 
stantiate the  theory  and  validate  the  computer  program. 
One  computer  run  was  made  for  each  of  the  cases.  Only 
one  of  the  runs  will  be  explicitly  reported  here.  In 
all  of  the  four  cases  studied,  the  true  delay  was 
set  equal  to  zero  (without  loss  of  generality).  Further, 
the  signal  attenuation  was  set  equal  to  unity  so  that 
(3-1)  becomes 


Xj^(t)  = s(t)+nj^(t) 

(D-la) 

X2(t)  = s(t+D)+n2(t) 

(D-lb) 

D = 0 . 

(D-lc) 

Our  desire  is  to  see  whether  (and  "how  well")  we  can 
estimate  the  (assumed  unknown)  parameter  D,  given  a 
T second  observation  of  Xj^(t)  and  X2(t).  The  variance 
of  the  ML  processor  (as  discussed  after  (3-34))  depends 
on  the  particular  signal  and  noise  spectral  characteristics 
(in  particular,  Moreover,  the  variance  of  the 
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delay  estimate  can  only  be  empirically  determined  by 
resort  to  numerous  (expensive)  computer  runs.  We  have 
not  done  that  here  (but  have  suggested  further  work  in 
this  area  (Chapter  6)).  We  have,  however,  made  four 
computer  runs  for  the  data  cases  synthesized  by 
Figure  D-1.  As  shown  in  the  figure,  the  signal  spectrum 
has  two  nonzero  frequency  bands.  The  bands  are  10  Hz 
wide  centered  at  5 and  50  Hz.  Each  of  the  five  filters 
represented  in  Figure  D-1  is  the  cascade  of  two  sections, 
each  with  a 48  dB/octave  roll  off.  The  noise  generators 
generate  white  noise.  Details  of  the  hardware  are  the 
same  as  described  on  pages  71-72  of  Carter  (1972a). 

The  actual  data  generation  reouired  less  hardware  than 
shown  in  Figure  D-1,  but  the  simulation  is  easier  to 
visualize  by  studying  Figure  D-1  and  is  closer  to  what 
would  be  done  in  a real  time  simulation  of  the  type 
suggested  in  Chapter  6.  In  our  experiment,  we  adjust  the 
SNR  by  adjusting  the  gain  in  Figure  D-1. 

The  digital  outputs  of  the  data  synthesized  are 
stored  on  magnetic  tape  for  use  by  the  computer  program 
(Appendix  C).  Longer  observation  time  is  achieved  by 
reading  more  data  from  the  magnetic  tape.  In  the  four 
example  cases,  the  ML  processor  output  was  examined 
for  two  different  signal  levels  and  two  different 
averaging  times  T.  Expect  for  absolute  SNR  level  all 
four  example  cases  had  the  same  signal  and  the  same 
noise  spectral  densities.  As  expected,  when  the  SNR 
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was  low,  more  averaging  time  was  required  to  extract 
a "good"  delay  estimate;  this  behavior  is  predicted 
by  (3-34).  In  particular,  of  our  four  cases,  the  low 
SNR  and  short  averaging  case  resulted  in  unusable  delay 
estimates.  The  reason  for  this  was  apparent  upon 
inspecting  the  coherence  estimates  used  to  approximate 
the  true  coherence.  As  predicted  in  appendix  B with 
short  averaging  times  (that  is,  small  N),  we  were  unable 
to  detect  a low  coherent  source. 

This  happened  to  our  on3  trial  at  low  SNR  and  short 
averaging;  however,  by  increasing  the  averaging  time, 
an  acceptable  time  delay  estimate  was  obtained.  We 
were  able  to  increase  the  averaging  time  (essentially 
without  bound)  since  the  example  cases  were  using 
laboratory  data. 

The  case  which  we  will  report  in  detail  is  the 
high- coherence,  short -averaging  case.  In  particular, 
the  gain  in  Figure  D-1  is  adjusted  so  that  C*0.6  in  the 
frequency  bands  with  signal  power  and  C*0  in  the  other 
bands.  The  characteristicJi  of  Xj^(t)  and  X2(t)  were 
estimated  from  8 seconds  of  data  with  16  independent 
segments  (each  of  1/2  second  du»-ation,  that  is,  2 Hz 
resolution).  FFTs  of  600  .samples  (1/2  sec  times  1200 
samples/sec)  can  be  performed  using  the  fast  mixed 
radix  FFT  of  Singleton  (1969). 

The  charscteristics  of  the  noise  generators  in 
Figure  D-1  were  essentially  identical.  Thus,  for  the 
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model  (D-1),  G (f)=G  (f),Vf.  The  estimates  of 

*2^2 

G (f)  are  depicted  in  Figure  D-2.  The  estimates  of 
^1^1 

G V were  extremely  similar  and  are  not  repeated. 

*2^2 

The  extent  to  which  Xj^(t)  and  XgCt)  are  similar  is 
measured  by  the  MSC  estimate  in  Figure  D-3.  Since  the 
CC  and  delay  D depend  upon  the  phase,  the  phase  estimates 
are  depicted  in  Figure  D-4 . The  slope  of  the  phase 
estimates  is  an  important  indicator  of  delay^  in  those 
frequency  bands  where  the  MSC  is  strong  (namely,  0-10  Hz 
and  45-55  Hz).  Using  the  algorithm  discussed  in 
Chapter  3 and  the  estimation  techniques  of  appendix  A 
implemented  in  appendix  C,  we  hav^  obtained  the  delay 
estimate  given  in  Figure  D-5.  From  Figure  D-5  we  see 
that  the  GCC  with  ML  weighting  peaks  very  close  to  the 
true  value  of  delay,  namely,  D=0.  A blowup  of  Figure 
D-5  given  in  Figure  D-6  shows  that  the  peak  is  within 
10  msec  of  the  true  value.  Clearly,  the  estimation 
technique  proposed  is  a viable  method  for  estimating 
time  delay. 


^Dimensionally  the  slope  is  the  phase  angle  in 
radians  divided  by  the  frequency  in  radians  per  sec. 
Thus  the  slope  of  the  phase  is  measured  in  seconds. 


Figure  D-3  Estimaterj  of  C (f)  for  Example  Case 


(f)  for  Example  Case 


Rure  D-5  Estimates  of  Time  Delay  Using  ML  Weighting  with  GCC  Processing 


Figure  D-6  Expanded  Figure  D-5  Time  De 
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